Page 338 -
P. 338

12.3   Reliability specification  321


                                       that compensate for software failure, there may also be related reliability requirements
                                       to help detect and recover from hardware failures and operator errors.
                                         Reliability is different from safety and security in that it is a measurable system
                                       attribute. That is, it is possible to specify the level of reliability that is required, mon-
                                       itor the system’s operation over time, and check if the required reliability has been
                                       achieved. For example, a reliability requirement might be that system failures that
                                       require a reboot should not occur more than once per week. Every time such a fail-
                                       ure occurs, it can be logged and you can check if the required level of reliability has
                                       been achieved. If not, you either modify your reliability requirement or submit a
                                       change request to address the underlying system problems. You may decide to
                                       accept a lower level of reliability because of the costs of changing the system to
                                       improve reliability or because fixing the problem may have adverse side effects,
                                       such as lower performance or throughput.
                                         By contrast, both safety and security are about avoiding undesirable situations,
                                       rather than specifying a desired ‘level’ of safety or security. Even one such situation
                                       in the lifetime of a system may be unacceptable and, if it occurs, system changes
                                       have to be made. It makes no sense to make statements like ‘system faults should
                                       result in fewer than 10 injuries per year.’ As soon as one injury occurs, the system
                                       problem must be rectified.
                                         Reliability requirements are, therefore, of two kinds:

                                       1.  Non-functional requirements, which define the number of failures that are
                                          acceptable during normal use of the system, or the time in which the system is
                                          unavailable for use. These are quantitative reliability requirements.
                                       2.  Functional requirements, which define system and software functions that
                                          avoid, detect, or tolerate faults in the software and so ensure that these faults do
                                          not lead to system failure.
                                         Quantitative reliability requirements lead to related functional system require-
                                       ments. To achieve some required level of reliability, the functional and design
                                       requirements of the system should specify the faults to be detected and the actions
                                       that should be taken to ensure that these faults do not lead to system failures.
                                         The process of reliability specification can be based on the general risk-driven
                                       specification process shown in Figure 12.1:


                                       1.  Risk identification At this stage, you identify the types of system failures that
                                          may lead to economic losses of some kind. For example, an e-commerce system
                                          may be unavailable so that customers cannot place orders, or a failure that cor-
                                          rupts data may require time to restore the system database from a backup and
                                          rerun transactions that have been processed. The list of possible failure types,
                                          shown in Figure 12.6, can be used as a starting point for risk identification.

                                       2.  Risk analysis This involves estimating the costs and consequences of different
                                          types of software failure and selecting high-consequence failures for further
                                          analysis.
   333   334   335   336   337   338   339   340   341   342   343