Page 341 -
P. 341

324   Chapter 12   Dependability and security specification


                                    reliability for a system with long transactions (such as a computer-aided design sys-
                                    tem), you should specify the reliability with a long mean time to failure. The MTTF
                                    should be much longer than the average time that a user works on his or her models
                                    without saving their results. This would mean that users would be unlikely to lose
                                    work through a system failure in any one session.
                                      To assess the reliability of a system, you have to capture data about its operation.
                                    The data required may include:


                                    1.  The number of system failures given a number of requests for system services.
                                        This is used to measure the POFOD.

                                    2.  The time or the number of transactions between system failures plus the total
                                        elapsed time or total number of transactions. This is used to measure ROCOF
                                        and MTTF.
                                    3.  The repair or restart time after a system failure that leads to loss of service. This
                                        is used in the measurement of availability. Availability does not just depend on
                                        the time between failures but also on the time required to get the system back
                                        into operation.

                                      The time units that may be used are calendar time or processor time or a discrete
                                    unit such as number of transactions. In systems that spend much of their time wait-
                                    ing to respond to a service request, such as telephone switching systems, the time
                                    unit that should be used is processor time. If you use calendar time, then this will
                                    include the time when the system was doing nothing.
                                      You  should  use  calendar  time  for  systems  that  are  in  continuous  operation.
                                    Monitoring systems, such as alarm systems, and other types of process control sys-
                                    tems fall into this category. Systems that process transactions such as bank ATMs or
                                    airline reservation systems have variable loads placed on them depending on the time
                                    of day. In these cases, the unit of ‘time’ used could be the number of transactions (i.e.,
                                    the ROCOF would be number of failed transactions per N thousand transactions).


                            12.3.2 Non-functional reliability requirements
                                    Non-functional  reliability  requirements  are  quantitative  specifications  of  the
                                    required reliability and availability of a system, calculated using one of the metrics
                                    described in the previous section. Quantitative reliability and availability specifica-
                                    tion has been used for many years in safety-critical systems but is only rarely used in
                                    business critical systems. However, as more and more companies demand 24/7 ser-
                                    vice from their systems, it is likely that such techniques will be increasingly used.
                                      There are several advantages in deriving quantitative reliability specifications:

                                    1.  The process of deciding what required level of the reliability helps to clarify
                                        what stakeholders really need. It helps stakeholders understand that there are
                                        different types of system failure, and it makes clear to them that high levels of
                                        reliability are very expensive to achieve.
   336   337   338   339   340   341   342   343   344   345   346