Page 316 -
P. 316

11.3   Safety  299


                                       2.  Errors are transient. A state variable may have an incorrect value caused by the
                                           execution of faulty code. However, before this is accessed and causes a system fail-
                                           ure, some other system input may be processed that resets the state to a valid value.

                                       3.  The system may include fault detection and protection mechanisms. These
                                           ensure that the erroneous behavior is discovered and corrected before the sys-
                                           tem services are affected.

                                         Another reason why the faults in a system may not lead to system failures is that,
                                       in practice, users adapt their behavior to avoid using inputs that they know cause
                                       program failures. Experienced users ‘work around’ software features that they have
                                       found to be unreliable. For example, I avoid certain features, such as automatic num-
                                       bering in the word processing system that I used to write this book. When I used
                                       auto-numbering, it often went wrong. Repairing the faults in unused features makes
                                       no practical difference to the system reliability. As users share information on prob-
                                       lems and work-arounds, the effects of software problems are reduced.
                                         The distinction between faults, errors, and failures, explained in Figure 11.3,
                                       helps identify three complementary approaches that are used to improve the reliabil-
                                       ity of a system:

                                       1.  Fault avoidance Development techniques are used that either minimize the
                                           possibility of human errors and/or that trap mistakes before they result in the
                                           introduction of system faults. Examples of such techniques include avoiding
                                           error-prone programming language constructs such as pointers and the use of
                                           static analysis to detect program anomalies.

                                       2.  Fault detection and removal The use of verification and validation techniques
                                           that increase the chances that faults will be detected and removed before the
                                           system is used. Systematic testing and debugging is an example of a fault-
                                           detection technique.
                                       3.  Fault tolerance These are techniques that ensure that faults in a system do not
                                           result in system errors or that system errors do not result in system failures. The
                                           incorporation of self-checking facilities in a system and the use of redundant
                                           system modules are examples of fault tolerance techniques.


                                         The practical application of these techniques is discussed in Chapter 13, which
                                       covers techniques for dependable software engineering.




                                11.3 Safety


                                       Safety-critical systems are systems where it is essential that system operation is
                                       always safe; that is, the system should never damage people or the system’s environ-
                                       ment even if the system fails. Examples of safety-critical systems include control
   311   312   313   314   315   316   317   318   319   320   321