Discriminating Fault Rate and Persistency to Improve Fault Treatment

A. Bondavalli*, S. Chiaradonna**, F. Di Giandomenico** and F. Grandoni**

In this paper the consolidate identification of faults, distinguished as transient or permanent/intermittent, is approached, through the definition of a fault identification mechanism, called a-count. The goal is to allow continued use of parts being hit by transient faults, which may lead to better overall system performance if proper handling is provided. Transient faults discrimination is especially important in all those dependability-qualified applications where replacing and repairing failed components is costly, difficult or impossible at all (as on computer-guided space probes). a-count tries to balance between two conflicting requirements: the first is to keep in the system those components that have experienced just transient faults; the other is to quickly remove those affected by permanent or intermittent faults. The delay in spotting faulty components and the probability of improperly blaming correct ones are evaluated, as a-count's figures of merit. The approach is compared with some heuristics developed to deal with the same problem. 

Keywords: Fault Persistency Discrimination, Fault Treatment, Scoring Functions, Threshold-based Identification, Modelling and Evaluation.


