Page 370 -
P. 370

13.3   Dependable system architectures  353



                                                              A1

                                             Input                                   Output
                                                              A2
                                                                                     Selector

                     Figure 13.6 Triple                       A3
                     modular redundancy



                                                           Version 1


                                           Input                                    Output
                                                           Version 2
                                                                                    Selector   Agreed
                                                                                               Result
                                                           Version 3
                                                                                     Fault
                                                                                    Manager
                     Figure 13.7 N-version            N Software Versions
                     programming

                                         Of course, the components could all have a common design fault and thus all pro-
                                       duce the same (wrong) answer. Using hardware units that have a common specifica-
                                       tion but which are designed and built by different manufacturers reduces the chances
                                       of such a common mode failure. It is assumed that the probability of different teams
                                       making the same design or manufacturing error is small.
                                         A similar approach can be used for fault-tolerant software where N diverse ver-
                                       sions of a software system execute in parallel (Avizienis, 1985; Avizienis,1995).
                                       This approach to software fault tolerance, illustrated in Figure 13.7, has been used in
                                       railway signaling systems, aircraft systems, and reactor protection systems.
                                         Using a common specification, the same software system is implemented by a
                                       number of teams. These versions are executed on separate computers. Their outputs
                                       are compared using a voting system, and inconsistent outputs or outputs that are not
                                       produced in time are rejected. At least three versions of the system should be avail-
                                       able so that two versions should be consistent in the event of a single failure.
                                         N-version programming may be less expensive that self-checking architectures in sys-
                                       tems for which a high level of availability is required. However, it still requires several
                                       different teams to develop different versions of the software. This leads to very high soft-
                                       ware development costs. As a result, this approach is only used in systems where it is
                                       impractical to provide a protection system that can guard against safety-critical failures.

                               13.3.4 Software diversity
                                       All of the above fault-tolerant architectures rely on software diversity to achieve fault
                                       tolerance. This is based on the assumption that diverse implementations of the same
                                       specification (or a part of the specification, for protection systems) are independent.
                                       They should not include common errors and so will not fail in the same way, at the
   365   366   367   368   369   370   371   372   373   374   375