Page 340 -
P. 340
12.3 Reliability specification 323
Availability Explanation
0.9 The system is available for 90% of the time. This means that, in a 24-hour period
(1,440 minutes), the system will be unavailable for 144 minutes.
0.99 In a 24-hour period, the system is unavailable for 14.4 minutes.
0.999 The system is unavailable for 84 seconds in a 24-hour period.
0.9999 The system is unavailable for 8.4 seconds in a 24-hour period. Roughly, one minute per week.
Figure 12.7 failure. So, POFOD 0.001 means that there is a 1/1,000 chance that a failure
Availability will occur when a demand is made.
specification
2. Rate of occurrence of failures (ROCOF) This metric sets out the probable
number of system failures that are likely to be observed relative to a certain time
period (e.g., an hour), or to the number of system executions. In the example
above, the ROCOF is 1/1,000. The reciprocal of ROCOF is the mean time to
failure (MTTF), which is sometimes used as a reliability metric. MTTF is the
average number of time units between observed system failures. Therefore,
a ROCOF of two failures per hour implies that the mean time to failure is
30 minutes.
3. Availability (AVAIL) The availability of a system reflects its ability to deliver
services when requested. AVAIL is the probability that a system will be opera-
tional when a demand is made for service. Therefore, an availability of 0.9999,
means that, on average, the system will be available for 99.99% of the operating
time. Figure 12.7 shows what different levels of availability mean in practice.
POFOD should be used as a reliability metric in situations where a failure on
demand can lead to a serious system failure. This applies irrespective of the fre-
quency of the demands. For example, a protection system that monitors a chemical
reactor and shuts down the reaction if it is overheating should have its reliability
specified using POFOD. Generally, demands on a protection system are infrequent
as the system is a last line of defense, after all other recovery strategies have failed.
Therefore a POFOD of 0.001 (1 failure in 1,000 demands) might seem to be risky,
but if there are only two or three demands on the system in its lifetime, then you will
probably never see a system failure.
ROCOF is the most appropriate metric to use in situations where demands on sys-
tems are made regularly rather than intermittently. For example, in a system that han-
dles a large number of transactions, you may specify a ROCOF of 10 failures per
day. This means that you are willing to accept that an average of 10 transactions per
day will not complete successfully and will have to be canceled. Alternatively, you
may specify ROCOF as the number of failures per 1,000 transactions.
If the absolute time between failures is important, you may specify the reliability
as the mean time between failures. For example, if you are specifying the required