Abstract
A trend in computer development aiming at high‐speed processing is high‐level parallel processing using a large number of processing elements. This scheme is becoming more realistic with the recent progress of VLSI technology. On the other hand, there arises a problem of how to cope with the generation of faults with the increased number of processing elements. A faulttolerant computer with multiple redundancy has been developed, but no method has been presented in the parallel computer environment whereby sufficient redundancy against fault can be provided, to recover from fault and to continue the computation without a system down. In general, completeness of data is lost by a fault. In the field of numerical computation, however, there are problems with less stringent requirement for completeness of data (e.g., in iterative solution of a system of equations). This paper discusses the case where such a problem is solved by a parallel computer with lattice topology. Three structural types are proposed for dynamic fault recovery during execution, mutual connection and the method of recovery. The result of evaluation by simulation is shown.
Original language | English |
---|---|
Pages (from-to) | 10-18 |
Number of pages | 9 |
Journal | Systems and Computers in Japan |
Volume | 17 |
Issue number | 7 |
DOIs | |
Publication status | Published - 1986 |
ASJC Scopus subject areas
- Theoretical Computer Science
- Information Systems
- Hardware and Architecture
- Computational Theory and Mathematics