Page 366 -
P. 366
13.3 Dependable system architectures 349
It can be an efficient way of using hardware if the backup server is one that is nor-
mally used for low-priority tasks. If a problem occurs with a primary server, its pro-
cessing is transferred to the backup server, which gives that work the highest priority.
Replicated servers provide redundancy but not usually diversity. The hardware is
usually identical and they run the same version of the software. Therefore, they can
cope with hardware failures and software failures that are localized to a single
machine. They cannot cope with software design problems that cause all versions of
the software to fail at the same time. To handle software design failures, a system has
to include diverse software and hardware, as I have discussed in Section 13.1.
Software diversity and redundancy can be implemented in a number of different
architectural styles. I describe some of these in the remainder of this section.
13.3.1 Protection systems
A protection system is a specialized system that is associated with some other sys-
tem. This is usually a control system for some process, such as a chemical manufac-
turing process or an equipment control system, such as the system on a driverless
train. An example of a protection system might be a system on a train that detects if
the train has gone through a red signal. If so, and there is no indication that the train
control system is decelerating the train, then the protection system automatically
applies the train brakes to bring it to a halt. Protection systems independently moni-
tor their environment and, if the sensors indicate a problem that the controlled sys-
tem is not dealing with, then the protection system is activated to shut down the
process or equipment.
Figure 13.3 illustrates the relationship between a protection system and a con-
trolled system. The protection system monitors both the controlled equipment and
the environment. If a problem is detected, it issues commands to the actuators to shut
down the system or invoke other protection mechanisms such as opening a pressure-
release valve. Notice that there are two sets of sensors. One set is used for normal
system monitoring and the other specifically for the protection system. In the event
of sensor failure, there are backups that will allow the protection system to continue
in operation. There may also be redundant actuators in the system.
A protection system only includes the critical functionality that is required to
move the system from a potentially unsafe state to a safe state (system shutdown). It
is an instance of a more general fault-tolerant architecture in which a principal sys-
tem is supported by a smaller and simpler backup system that only includes essential
functionality. For example, the U.S. space shuttle control software has a backup sys-
tem that includes ‘get you home’ functionality; that is, the backup system can land
the vehicle if the principal control system fails.
The advantage of this kind of architecture is that protection system software can be
much simpler than the software that is controlling the protected process. The only
function of the protection system is to monitor operation and to ensure that the system
is brought to a safe state in the event of an emergency. Therefore, it is possible to invest
more effort in fault avoidance and fault detection. You can check that the software

