Page 343 -
P. 343
326 Chapter 12 Dependability and security specification
times per second. The projected lifetime of the system is 10 years. The total number
14
of system executions is therefore approximately 3*10 . There is no point in speci-
fying that the rate of occurrence of failure should be 1/10 15 executions (this allows
for some safety factor) as you cannot test the system for long enough to validate this
level of reliability.
Organizations must therefore be realistic about whether it is worth specifying and
validating a very high level of reliability. High reliability levels are clearly justified
in systems where reliable operation is critical, such as telephone switching systems,
or where system failure may result in large economic losses. They are probably not
justified for many types of business or scientific systems. Such systems have modest
reliability requirements, as the costs of failure are simply processing delays and it is
straightforward and relatively inexpensive to recover from these.
There are a number of steps that you can take to avoid the overspecification of
system reliability:
1. Specify the availability and reliability requirements for different types of fail-
ures. There should be a lower probability of serious failures occurring than
minor failures.
2. Specify the availability and reliability requirements for different services sepa-
rately. Failures that affect the most critical services should be specified as less
probable than those with only local effects. You may decide to limit the quanti-
tative reliability specification to the most critical system services.
3. Decide whether you really need high reliability in a software system or whether
the overall system dependability goals can be achieved in other ways. For exam-
ple, you may use error detection mechanisms to check the outputs of a system
and have processes in place to correct errors. There may then be no need for
a high level of reliability in the system that generates the outputs.
To illustrate this latter point, consider the reliability requirements for a bank ATM
system that dispenses cash and provides other services to customers. If there are
hardware or software ATM problems, then these lead to incorrect entries in the cus-
tomer account database. These could be avoided by specifying a very high level of
hardware and software reliability in the ATM.
However, banks have many years of experience of how to identify and correct
incorrect account transactions. They use accounting methods to detect when things
have gone wrong. Most transactions that fail can simply be canceled, resulting in no
loss to the bank and minor customer inconvenience. Banks that run ATM networks
therefore accept that ATM failures may mean that a small number of transactions are
incorrect but they think it more cost effective to fix these later rather than to incur
very high costs in avoiding faulty transactions.
For a bank (and for the bank’s customers), the availability of the ATM network is
more important than whether or not individual ATM transactions fail. Lack of avail-
ability means more demand on counter services, customer dissatisfaction, engineer-
ing costs to repair the network, etc. Therefore, for transaction-based systems, such as