Page 241 -
P. 241
212 PART TWO MANAGING SOFTWARE PROJECTS
A comprehensive discussion of statistical SQA is beyond the scope of this book.
Interested readers should see [SCH98], [KAP95], or [KAN95].
8.8 SOFTWARE RELIABILITY
There is no doubt that the reliability of a computer program is an important element
of its overall quality. If a program repeatedly and frequently fails to perform, it mat-
ters little whether other software quality factors are acceptable.
Software reliability, unlike many other quality factors, can be measured directed and
WebRef
estimated using historical and developmental data. Software reliability is defined in sta-
The Reliability Analysis
Center provides much tistical terms as "the probability of failure-free operation of a computer program in a
useful information on specified environment for a specified time" [MUS87]. To illustrate, program X is estimated
reliability, maintainability, to have a reliability of 0.96 over eight elapsed processing hours. In other words, if pro-
supportability, and quality
at rac.iitri.org gram X were to be executed 100 times and require eight hours of elapsed processing
time (execution time), it is likely to operate correctly (without failure) 96 times out of 100.
Whenever software reliability is discussed, a pivotal question arises: What is meant
by the term failure? In the context of any discussion of software quality and reliabil-
ity, failure is nonconformance to software requirements. Yet, even within this defin-
ition, there are gradations. Failures can be only annoying or catastrophic. One failure
can be corrected within seconds while another requires weeks or even months to
correct. Complicating the issue even further, the correction of one failure may in fact
result in the introduction of other errors that ultimately result in other failures.
8.8.1 Measures of Reliability and Availability
Early work in software reliability attempted to extrapolate the mathematics of hard-
ware reliability theory (e.g., [ALV64]) to the prediction of software reliability. Most
hardware-related reliability models are predicated on failure due to wear rather than
failure due to design defects. In hardware, failures due to physical wear (e.g., the
Software reliability
problems can almost effects of temperature, corrosion, shock) are more likely than a design-related fail-
always be traced to ure. Unfortunately, the opposite is true for software. In fact, all software failures can
errors in design or be traced to design or implementation problems; wear (see Chapter 1) does not enter
implementation. into the picture.
There has been debate over the relationship between key concepts in hardware
reliability and their applicability to software (e.g., [LIT89], [ROO90]). Although an
irrefutable link has yet be be established, it is worthwhile to consider a few simple
concepts that apply to both system elements.
If we consider a computer-based system, a simple measure of reliability is mean-
time-between-failure (MTBF), where
MTBF = MTTF + MTTR
The acronyms MTTF and MTTR are mean-time-to-failure and mean-time-to-repair,
respectively.