Fault- tolerant systems

Journal Title, Volume, Page:

Journal On Parallel And Distributed Computing, Special Issue Petri Net Models Of Parallel Computers, vol. 15, no. 3, July 1992, pp. 238-254

Year of Publication:

1992

Link:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.3928&rank=1

Authors:

Luai M. Malhis

Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721

Current Affiliation:

Department of Computer Engineering, An-Najah National University, Palestine

[email protected]

William H. Sanders

Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721

[email protected]

Preferred Abstract (Original):

Dependability evaluation is an important, but dicult, aspect of the design of fault-tolerant parallel and distributed computing systems. One possible technique is to use Markov mod- els, but if applied directly to realistic designs, this often results in large and intractable mod- els. Many authors have investigated methods to avoid this explosive state-space growth, but have typically either solved the problem for a specic system design, or required manipula-tionof the model at the state-space level. Stochastic activity networks (SANs), a stochastic extension of Petri nets, together with recently developed reduced base model construction techniques, have the potential to avoid this state space growth at the SAN level for many parallel and distributed systems. This paper investigates this claim, by considering their application to three dierent systems: a fault-tolerant parallel computing system, a dis- tributed database architecture, and a multiprocessor-multimemory system. We show that this method does indeed result in tractable Markov models for these systems, and argue that it can be applied to the dependability evaluation of many parallel and distributed systems.

An-Najah Staff

Dependability Evaluation Using Composed San-Based Reward Models