Page 212 - Pipeline Risk Management Manual Ideas, Techniques, and Resources
P. 212

Data analyses 8/189
               results of the risk assessment and should understand and agree   is avoided. Although very sophisticated analysis techniques are
               with  all  underlying  assumptions,  calculations,  and protocols   certainly available, the reader should consider the costs of such
               employed.                                  techniques, their applicability to this type of data, and the incre-
                This book may provide some of the background documenta-   mental benefit (if any) from their use. As with all aspects of risk
               tion necessary for a sohare program that incorporates a model   management, the benefits of the data analysis must outweigh
               similar to the one described here. It contains explanations as to   the costs of the analysis.
               why and how certain variables are given more points than oth-   When presented with almost any set of numbers, the logical
               ers and why certain variables are considered at all. Where the   first step is to make a “picture” of the numbers. It is sometimes
               book may provide the rationale behind the risk assessment, the   wise to do this even before summary statistics (average, stan-
               software documentation must additionally note the workings of   dard deviation, etc.) are calculated. A single statistic, such as
               all routines, the structure of the data, and all aspects of the pro-   the average, is rarely enough to draw meaningful conclusions
               gram. A data dictionary  is normally included in the software   about a data set. At a minimum, a calculated measure of central
               documentation.                             tendency and a measure of variation are both required. On the
                                                          other hand, a chart or graph can at a glance give the viewer a feel
                                                          for how  the numbers are “behaving.” The use of graphs and
               IX.  Data analysis                         charts to better understand data sets is discussed in a following
                                                          section.
               An  earlier  chapter  made  a  connection  between  the  quality   To  facilitate the discussion of graphs and  statistics, a few
               process (total quality management, continuous improvement,   simple statistical measures will be reviewed. To help analyze
               etc.) and risk management. In striving to truly understand work   the data, two types of measurements will be of most use: meas-
               processes,  measurement  becomes  increasingly  important.   ures of central tendency and measures of variation.
               Once measurement is done, analysis of the resulting data is the
               next step. Here again, the connection between quality and risk   Measure of central tendency
               is useful. Quality processes provide guidance on data analysis.
               This section presents some straightforward, techniques to assist   This class of measurements tells us where the “center of the
               in interpreting and responding to the information that is con-   data” lie. The two most common measures are the average (or
               tained in the risk assessment data.        arithmetic mean, or simply mean) and the median. These are
                In using any risk assessment technique, we must recognize   often confused. The average is the sum of all the values divided
               that knowledge is incomplete. This was addressed in Chapter 1   by the number of values in the data set. The mean is often used
               in a discussion  of rare  occurrence  events and predictions  of   interchangeably with the average. but is better reserved for use
               future events using historical  data.  Risk  weightings, interac-   when the entire population is being modeled. That is, the aver-
               tions,  consequences,  and  scores  are  by  necessity  based  on   age is a calculated value from an actual data set while the mean
               assumptions.  Ideally,  the  assumptions  are  supported  by   is the average for the entire population of data. Because we will
               sound engineering judgment and hundreds of person-years of   rarely have perfect knowledge of a population, the population
               pipeline  experience. Yet  in the  final analysis, high  levels of   mean is usually estimated from the average of the sample data.
               uncertainty  will  be  present.  Uncertainty  is  present  to  some   There is a useful rule of thumb regarding the average and a
               degree in any measurement. Chapter 1 provides some guidance   histogram (histograms are discussed in a following section):
               in minimizing the measurement inconsistencies. Recognizing   The average will always be the balance point of a histogram.
               and compensating for the uncertainty is critical in proper data   That is, if the x axis were a board and the frequency bars were
               analysis.                                  stacks ofbricks on the board, the point at which the board would
                The data set to be analyzed will normally represent only a   balance horizontally is the average. The application of this rela-
               small sample of the whole “population” of data in which we are   tionship is discussed later.
               really interested. Ifwe think of the population of data as all risk   The  second  common  measure  of  central  tendency,  the
               scores, past,  present,  and future, then the data sample to be   median. is often used in data such as test scores, house prices,
               analyzed can be seen as a “snapshot.” This snapshot is to be   and  salaries. The median yields  important information espe-
               used to predict  future occurrences and make resource alloca-   cially when used with the average. The median is the point at
               tion decisions accordingly.                which there are just as many values above as below. Unlike the
                The objective of data analyses is to obtain and communicate   average, the median  is insensitive to extreme values-cither
               information about the risk of a given pipeline. A certain dis-   very high or very low numbers. The average of a data set can be
               service is done when a single risk score is offered as the answer.   dramatically affected by one or two values being very high or
               A risk score is meaningful only in relation to other risk scores or   very low. The median will not be affected.
               to  some  correlated  absolute  risk  value.  Even  if  scores  are   A third, less commonly used measure of central tendency is
               closely correlated to historical accident data, the number only   the mode. The mode is simply the most frequently occurring
               represents one possibility in the context of all other numbers   value. From a practical viewpoint, the mode is often the best
               representing slightly different conditions. This necessitates the   predictor ofthe value that may occur next.
               use of multiple values to really understand the risk picture.   An  important  concept  for  beginners  to  remember  is  that
                The  application  of  some simple  graphical  and  statistical   these three values are not necessarily the same. In a normal or
               techniques changes columns and rows of numbers into trends,   bell-shaped distribution, possibly the most commonly seen dis-
               central tendencies, and actioddecision points. More informa-   tribution, they are all the same, but this is not the case for other
               tion is extracted from numbers by proper data analysis, and the   common distributions. lfall three are known, then the data set is
               common mistake of “imagining information when none exists”   already more interpretable than if only one or two are known.
   207   208   209   210   211   212   213   214   215   216   217