Page 300 -

P. 300

10.5 System operation 283

2. The systems approach. The basic assumption is that people are fallible and will
make mistakes. The errors that people make are often a consequence of system
design decisions that lead to erroneous ways of working, or of organizational
factors, which affect the system operators. Good systems should recognize the
possibility of human error and include barriers and safeguards that detect human
errors and allow the system to recover before failure occurs. When a failure does
occur, the issue is not finding an individual to blame but to understand how and
why the system defences did not trap the error.

I believe that the systems approach is the right one and that systems engineers
should assume that human errors will occur during system operation. Therefore, to
improve the security and dependability of a system, designers have to think about the
defenses and barriers to human error that should be included in a system. They
should also think about whether these barriers should be built into the technical com-
ponents of the system. If not, they could be part of the processes and procedures for
using the system or could be operator guidelines that are reliant on human checking
and judgment.
Examples of defenses that may be included in a system are:

1. An air traffic control system may include an automated conflict alert system.
When a controller instructs an aircraft to change its speed or altitude, the system
extrapolates its trajectory to see if it could intersect with any other aircraft. If so,
it sounds an alarm.
2. The same system may have a clearly defined procedure to record the control
instructions that have been issued. These procedures help the controller check if
they have issued the instruction correctly and make the information available to
others for checking.
3. Air traffic control usually involves a team of controllers who constantly monitor
each other’s work. Therefore, when a mistake is made, it is likely that it will be
detected and corrected before an incident occurs.

Inevitably, all barriers have weaknesses of some kind. Reason calls these ‘latent
conditions’ as they usually only contribute to system failure when some other prob-
lem occurs. For example, in the above defenses, a weakness of a conflict alert system
is that it may lead to many false alarms. Controllers may therefore ignore warnings
from the system. A weakness of a procedural system may be that unusual but essen-
tial information can’t be easily recorded. Human checking may fail when all of the
people involved are under stress and make the same mistake.
Latent conditions lead to system failure when the defenses built into the system
do not trap an active failure by a system operator. The human error is a trigger for
the failure but should not be considered to be the sole cause of the failure. Reason
explains this using his well-known ‘Swiss cheese’ model of system failure
(Figure 10.9).

295 296 297 298 299 300 301 302 303 304 305