Page 366 -
P. 366

13.3   Dependable system architectures  349


                                       It can be an efficient way of using hardware if the backup server is one that is nor-
                                       mally used for low-priority tasks. If a problem occurs with a primary server, its pro-
                                       cessing is transferred to the backup server, which gives that work the highest priority.
                                         Replicated servers provide redundancy but not usually diversity. The hardware is
                                       usually identical and they run the same version of the software. Therefore, they can
                                       cope with hardware failures and software failures that are localized to a single
                                       machine. They cannot cope with software design problems that cause all versions of
                                       the software to fail at the same time. To handle software design failures, a system has
                                       to include diverse software and hardware, as I have discussed in Section 13.1.
                                         Software diversity and redundancy can be implemented in a number of different
                                       architectural styles. I describe some of these in the remainder of this section.



                               13.3.1 Protection systems

                                       A protection system is a specialized system that is associated with some other sys-
                                       tem. This is usually a control system for some process, such as a chemical manufac-
                                       turing process or an equipment control system, such as the system on a driverless
                                       train. An example of a protection system might be a system on a train that detects if
                                       the train has gone through a red signal. If so, and there is no indication that the train
                                       control system is decelerating the train, then the protection system automatically
                                       applies the train brakes to bring it to a halt. Protection systems independently moni-
                                       tor their environment and, if the sensors indicate a problem that the controlled sys-
                                       tem is not dealing with, then the protection system is activated to shut down the
                                       process or equipment.
                                         Figure 13.3 illustrates the relationship between a protection system and a con-
                                       trolled system. The protection system monitors both the controlled equipment and
                                       the environment. If a problem is detected, it issues commands to the actuators to shut
                                       down the system or invoke other protection mechanisms such as opening a pressure-
                                       release valve. Notice that there are two sets of sensors. One set is used for normal
                                       system monitoring and the other specifically for the protection system. In the event
                                       of sensor failure, there are backups that will allow the protection system to continue
                                       in operation. There may also be redundant actuators in the system.
                                         A protection system only includes the critical functionality that is required to
                                       move the system from a potentially unsafe state to a safe state (system shutdown). It
                                       is an instance of a more general fault-tolerant architecture in which a principal sys-
                                       tem is supported by a smaller and simpler backup system that only includes essential
                                       functionality. For example, the U.S. space shuttle control software has a backup sys-
                                       tem that includes ‘get you home’ functionality; that is, the backup system can land
                                       the vehicle if the principal control system fails.
                                         The advantage of this kind of architecture is that protection system software can be
                                       much simpler than the software that is controlling the protected process. The only
                                       function of the protection system is to monitor operation and to ensure that the system
                                       is brought to a safe state in the event of an emergency. Therefore, it is possible to invest
                                       more effort in fault avoidance and fault detection. You can check that the software
   361   362   363   364   365   366   367   368   369   370   371