Page 342 -

P. 342

12.3 Reliability specification 325

2. It provides a basis for assessing when to stop testing a system. You stop when
the system has achieved its required reliability level.
3. It is a means of assessing different design strategies intended to improve the reli-
ability of a system. You can make a judgment about how each strategy might
lead to the required levels of reliability.
4. If a regulator has to approve a system before it goes into service (e.g., all systems
that are critical to flight safety on an aircraft are regulated), then evidence that
a required reliability target has been met is important for system certification.

To establish the required level of system reliability, you have to consider the asso-
ciated losses that could result from a system failure. These are not simply financial
losses, but also loss of reputation for a business. Loss of reputation means that cus-
tomers will go elsewhere. Although the short-term losses from a system failure may
be relatively small, the longer-term losses may be much more significant. For exam-
ple, if you try to access an e-commerce site and find that it is unavailable, you may
try to find what you want elsewhere rather than wait for the system to become avail-
able. If this happens more than once, you will probably not shop at that site again.
The problem with specifying reliability using metrics such as POFOD, ROCOF,
and AVAIL is that it is possible to overspecify reliability and thus incur high devel-
opment and validation costs. The reason for this is that system stakeholders find it
difficult to translate their practical experience into quantitative specifications. They
may think that a POFOD of 0.001 (1failure in 1,000 demands) represents a relatively
unreliable system. However, as I have explained, if demands for a service are
uncommon, it actually represents a very high level of reliability.
If you specify reliability as a metric, it is obviously important to assess that the
required level of reliability has been achieved. You do this assessment as part of sys-
tem testing. To assess the reliability of a system statistically, you have to observe a
number of failures. If you have, for example, a POFOD of 0.0001 (1 failure in
10,000 demands), then you may have to design tests that make 50 or 60 thousand
demands on a system and where several failures are observed. It may be practically
impossible to design and implement this number of tests. Therefore, overspecifica-
tion of reliability leads to very high testing costs.
When you specify the availability of a system, you may have similar problems.
Although a very high level of availability may seem to be desirable, most systems
have very intermittent demand patterns (e.g., a business system will mostly be used
during normal business hours) and a single availability figure does not really reflect
user needs. You need high availability when the system is being used but not at other
times. Depending, of course, on the type of system, there may be no real practical
difference between an availability of 0.999 and an availability of 0.9999.
A fundamental problem with overspecification is that it may be practically
impossible to show that a very high level of reliability or availability has been
achieved. For example, say a system was intended for use in a safety-critical appli-
cation and was therefore required to never fail over its total lifetime. Assume that
1,000 copies of the system are to be installed and the system is executed 1,000

337 338 339 340 341 342 343 344 345 346 347