Page 212 - Pipeline Risk Management Manual Ideas, Techniques, and Resources
P. 212
Data analyses 8/189
results of the risk assessment and should understand and agree is avoided. Although very sophisticated analysis techniques are
with all underlying assumptions, calculations, and protocols certainly available, the reader should consider the costs of such
employed. techniques, their applicability to this type of data, and the incre-
This book may provide some of the background documenta- mental benefit (if any) from their use. As with all aspects of risk
tion necessary for a sohare program that incorporates a model management, the benefits of the data analysis must outweigh
similar to the one described here. It contains explanations as to the costs of the analysis.
why and how certain variables are given more points than oth- When presented with almost any set of numbers, the logical
ers and why certain variables are considered at all. Where the first step is to make a “picture” of the numbers. It is sometimes
book may provide the rationale behind the risk assessment, the wise to do this even before summary statistics (average, stan-
software documentation must additionally note the workings of dard deviation, etc.) are calculated. A single statistic, such as
all routines, the structure of the data, and all aspects of the pro- the average, is rarely enough to draw meaningful conclusions
gram. A data dictionary is normally included in the software about a data set. At a minimum, a calculated measure of central
documentation. tendency and a measure of variation are both required. On the
other hand, a chart or graph can at a glance give the viewer a feel
for how the numbers are “behaving.” The use of graphs and
IX. Data analysis charts to better understand data sets is discussed in a following
section.
An earlier chapter made a connection between the quality To facilitate the discussion of graphs and statistics, a few
process (total quality management, continuous improvement, simple statistical measures will be reviewed. To help analyze
etc.) and risk management. In striving to truly understand work the data, two types of measurements will be of most use: meas-
processes, measurement becomes increasingly important. ures of central tendency and measures of variation.
Once measurement is done, analysis of the resulting data is the
next step. Here again, the connection between quality and risk Measure of central tendency
is useful. Quality processes provide guidance on data analysis.
This section presents some straightforward, techniques to assist This class of measurements tells us where the “center of the
in interpreting and responding to the information that is con- data” lie. The two most common measures are the average (or
tained in the risk assessment data. arithmetic mean, or simply mean) and the median. These are
In using any risk assessment technique, we must recognize often confused. The average is the sum of all the values divided
that knowledge is incomplete. This was addressed in Chapter 1 by the number of values in the data set. The mean is often used
in a discussion of rare occurrence events and predictions of interchangeably with the average. but is better reserved for use
future events using historical data. Risk weightings, interac- when the entire population is being modeled. That is, the aver-
tions, consequences, and scores are by necessity based on age is a calculated value from an actual data set while the mean
assumptions. Ideally, the assumptions are supported by is the average for the entire population of data. Because we will
sound engineering judgment and hundreds of person-years of rarely have perfect knowledge of a population, the population
pipeline experience. Yet in the final analysis, high levels of mean is usually estimated from the average of the sample data.
uncertainty will be present. Uncertainty is present to some There is a useful rule of thumb regarding the average and a
degree in any measurement. Chapter 1 provides some guidance histogram (histograms are discussed in a following section):
in minimizing the measurement inconsistencies. Recognizing The average will always be the balance point of a histogram.
and compensating for the uncertainty is critical in proper data That is, if the x axis were a board and the frequency bars were
analysis. stacks ofbricks on the board, the point at which the board would
The data set to be analyzed will normally represent only a balance horizontally is the average. The application of this rela-
small sample of the whole “population” of data in which we are tionship is discussed later.
really interested. Ifwe think of the population of data as all risk The second common measure of central tendency, the
scores, past, present, and future, then the data sample to be median. is often used in data such as test scores, house prices,
analyzed can be seen as a “snapshot.” This snapshot is to be and salaries. The median yields important information espe-
used to predict future occurrences and make resource alloca- cially when used with the average. The median is the point at
tion decisions accordingly. which there are just as many values above as below. Unlike the
The objective of data analyses is to obtain and communicate average, the median is insensitive to extreme values-cither
information about the risk of a given pipeline. A certain dis- very high or very low numbers. The average of a data set can be
service is done when a single risk score is offered as the answer. dramatically affected by one or two values being very high or
A risk score is meaningful only in relation to other risk scores or very low. The median will not be affected.
to some correlated absolute risk value. Even if scores are A third, less commonly used measure of central tendency is
closely correlated to historical accident data, the number only the mode. The mode is simply the most frequently occurring
represents one possibility in the context of all other numbers value. From a practical viewpoint, the mode is often the best
representing slightly different conditions. This necessitates the predictor ofthe value that may occur next.
use of multiple values to really understand the risk picture. An important concept for beginners to remember is that
The application of some simple graphical and statistical these three values are not necessarily the same. In a normal or
techniques changes columns and rows of numbers into trends, bell-shaped distribution, possibly the most commonly seen dis-
central tendencies, and actioddecision points. More informa- tribution, they are all the same, but this is not the case for other
tion is extracted from numbers by proper data analysis, and the common distributions. lfall three are known, then the data set is
common mistake of “imagining information when none exists” already more interpretable than if only one or two are known.