Page 300 -
P. 300

10.5   System operation  283


                                       2.  The systems approach. The basic assumption is that people are fallible and will
                                          make mistakes. The errors that people make are often a consequence of system
                                          design decisions that lead to erroneous ways of working, or of organizational
                                          factors, which affect the system operators. Good systems should recognize the
                                          possibility of human error and include barriers and safeguards that detect human
                                          errors and allow the system to recover before failure occurs. When a failure does
                                          occur, the issue is not finding an individual to blame but to understand how and
                                          why the system defences did not trap the error.

                                         I believe that the systems approach is the right one and that systems engineers
                                       should assume that human errors will occur during system operation. Therefore, to
                                       improve the security and dependability of a system, designers have to think about the
                                       defenses and barriers to human error that should be included in a system. They
                                       should also think about whether these barriers should be built into the technical com-
                                       ponents of the system. If not, they could be part of the processes and procedures for
                                       using the system or could be operator guidelines that are reliant on human checking
                                       and judgment.
                                         Examples of defenses that may be included in a system are:

                                       1.  An air traffic control system may include an automated conflict alert system.
                                          When a controller instructs an aircraft to change its speed or altitude, the system
                                          extrapolates its trajectory to see if it could intersect with any other aircraft. If so,
                                          it sounds an alarm.
                                       2.  The same system may have a clearly defined procedure to record the control
                                          instructions that have been issued. These procedures help the controller check if
                                          they have issued the instruction correctly and make the information available to
                                          others for checking.
                                       3.  Air traffic control usually involves a team of controllers who constantly monitor
                                          each other’s work. Therefore, when a mistake is made, it is likely that it will be
                                          detected and corrected before an incident occurs.


                                         Inevitably, all barriers have weaknesses of some kind. Reason calls these ‘latent
                                       conditions’ as they usually only contribute to system failure when some other prob-
                                       lem occurs. For example, in the above defenses, a weakness of a conflict alert system
                                       is that it may lead to many false alarms. Controllers may therefore ignore warnings
                                       from the system. A weakness of a procedural system may be that unusual but essen-
                                       tial information can’t be easily recorded. Human checking may fail when all of the
                                       people involved are under stress and make the same mistake.
                                         Latent conditions lead to system failure when the defenses built into the system
                                       do not trap an active failure by a system operator. The human error is a trigger for
                                       the failure but should not be considered to be the sole cause of the failure. Reason
                                       explains  this  using  his  well-known  ‘Swiss  cheese’  model  of  system  failure
                                       (Figure 10.9).
   295   296   297   298   299   300   301   302   303   304   305