Page 359 -
P. 359

342   Chapter 13   Dependability engineering


                                    The use of software engineering techniques, better programming languages, and better
                                    quality management has led to significant improvements in dependability for most
                                    software. Nevertheless, system failures may still occur that affect the system’s avail-
                                    ability or lead to incorrect results being produced. In some cases, these failures simply
                                    cause minor inconvenience. System vendors may simply decide to live with these fail-
                                    ures, without correcting the errors in their systems. However, in some systems, failure
                                    can lead to loss of life or significant economic or reputational losses. These are known
                                    as ‘critical systems’, for which a high level of dependability is essential.
                                      Examples of critical systems include process control systems, protection systems
                                    that shut down other systems in the event of failure, medical systems, telecommunica-
                                    tions switches, and flight control systems. Special development tools and techniques
                                    may be used to enhance the dependability of the software in a critical system. These
                                    tools and techniques usually increase the costs of system development but they reduce
                                    the risk of system failure and the losses that may result from such a failure.
                                      Dependability engineering is concerned with the techniques that are used to
                                    enhance the dependability of both critical and non-critical systems. These techniques
                                    support three complementary approaches that are used in developing dependable
                                    software:


                                    1.  Fault avoidance The software design and implementation process should use
                                        approaches to software development that help avoid design and programming
                                        errors and so minimize the number of faults that are likely to arise when the sys-
                                        tem is executing. Fewer faults mean less chance of run-time failures.

                                    2.  Fault detection and correction The verification and validation processes are
                                        designed to discover and remove faults in a program, before it is deployed for
                                        operational use. Critical systems require very extensive verification and valida-
                                        tion to discover as many faults as possible before deployment and to convince
                                        the system stakeholders that the system is dependable. I cover this topic in
                                        Chapter 15.

                                    3.  Fault tolerance The system is designed so that faults or unexpected system
                                        behavior during execution are detected at run-time and are managed in such a
                                        way that system failure does not occur. Simple approaches to fault tolerance
                                        based on built-in run-time checking may be included in all systems. However,
                                        more specialized fault-tolerance techniques (such as the use of fault-tolerant
                                        system architectures) are generally only used when a very high level of system
                                        availability and reliability is required.


                                      Unfortunately, applying fault-avoidance, fault-detection, and fault-tolerance tech-
                                    niques leads to a situation of diminishing returns. The cost of finding and removing the
                                    remaining faults in a software system rises exponentially as program faults are discov-
                                    ered and removed (Figure 13.1). As the software becomes more reliable, you need to
                                    spend more and more time and effort to find fewer and fewer faults. At some stage,
                                    even for critical systems, the costs of this additional effort become unjustifiable.
   354   355   356   357   358   359   360   361   362   363   364