Page 202 - Pipeline Risk Management Manual Ideas, Techniques, and Resources
P. 202

Data preparation 8/179
              Several industry  database  design  standards  are emerging  as   points,  and  some based  on  coordinate  systems,  such  as  the
              of this  writing. Adhering to a standard model  facilitates  the   Global Positioning System (GPS). Alignment sheets normally
              efficient exchange of data with vendors and service providers   use stationing equations to capture adjustments and changes in
              (ILI, CIS, etc), as well as other pipeline companies and govern-   the pipeline route. These equations often complicate identifiers
              mental databases.                          since a stationing shown on an alignment sheet will often be
               Each event must have a condition assigned. Some conditions   inconsistent with the linear measurements taken in most sur-
              can be assigned as general defaults or as a system-wide charac-   veys. Information will need to be in a standard format or trans-
              teristic. Each event-condition  combination defines a risk char-   lation routines can be used to switch between alignment sheet
              acteristic for aportion of the system.     stationing and linear measurements.
               A  restricted  vocabulary  is  enforced  in  the  most  robust   All input information should be collected in a standard data
              software  applications.  Only  predefined  terms  can  be  used   format  with common field (column) names. A standard data
              to characterize  events. This eliminates  typos  and  the use  of   format  can  be  specified  for  collection  or  reformatting.
              different conditions to mean the same thing. For instance, for   Consider this example:
              the eventpipe manufacturer  = “Republic Steel Corp,” and not
              “Republic” or “Republic Steel” or “republic” or “RSC”; coat-   ID   Begstation   Endstation   Desc   Code   notes
              ing condition = “fair” and not “F” or “ok,” “medium” or “med,”
              etc.                                        where
               The data dictionary is a document that lists all events and   ID   =identifier relating to a specific length ofpipeline
              their underlying source, as well as all risk variables. It should   Begstation =the beginning point for a specific event and con-
              also show all conditions used for each event along with the full   dition, using a consistent distance measuring sys-
              description of each condition and its corresponding point val-   tem
              ues. The data dictionary is designed to be a reference and con-   Endstation =the end point for a specific event and condition,
              trol  document  for the risk assessment.  It should specify  the   using the same measurement system.
              owner (the person responsible for the data) as well as update   Desc   = the name of the event
              frequency,  accuracy,  and  other  pertinent  information  about   Code   =the condition.
              each piece of data, sometimes called meta data.
               In common database terminology, each row of data is called   Each  record  in  the  initial  events  database  therefore
              a record and each column is called afield. So, each record is   corresponds to an event that reports a condition for some risks
              composed of several fields of information and each field con-   variable for a specific distance along a specific pipeline.
              tains information related to each record. A collection of records   In data collection and compilation, an evaluator may wish to
              and  fields can  be  called  a  database,  a  data set, or a  table.   keep separate data sets-perhaps  a different data set for each
              Information will usually be collected and put into a database (a   event or each event in each operating area-for  ease of editing
              spreadsheet can be a type of database). Results of risk assess-   and maintenance during the data collection process. The num-
              ments will also normally be put into a database environment.   ber of separate data sets that are created to contain all the infor-
               GIs is  a  geographical  information  system  that  combines   mation is largely a matter of preference. Having few data sets
              database capabilities with graphics (especially maps) capabili-   makes tracking of each easier, but makes each one rather large
              ties. GIS is increasingly the software environment of choice for   and slow to process and also may make it more difficult to find
              assets  that  span  large  geographic  areas.  Most  GIS environ-   specific pieces of information. Having many data sets means
              ments have a programming language that can extract data and   each is smaller and quicker to process and contains only a few
              combine them according to the rules of an algorithm. Common   information types. However, managing many smaller data sets
              applications for more detailed risk assessments will be model-   may be more problematic. Especially in cases where the num-
              ing for flowpath or dispersion distances and directions, surface   ber of event records is not huge, maintaining separate data sets
              flow resistance, soil penetration, and hazard zone calculations.   might not be beneficial.
              It can also be the calculating “engine” for producing risk scores.   Separate data sets will need to be combined for purposes of
               SQL refers to Structured Query Language, a software lan-   segmentation and assignment of risk scores. The combining of
              guage recognized by  most  database  software. Using  SQL, a   data sets can be  done efficiently  through  the use  of certain
              query can be created  to extract certain information  from the   queries in the SQL ofmost common database software.
              database or to combine or present information in a certain way.   A scoring assessment requires the assignment of a numerical
              Therefore,  SQL can take individual pieces  of data from the   value corresponding to each condition. For example, the event
              database and apply the rules of the algorithm to generate risk   environ sensitivity is scored as “High’ which equals a value of 3
              scores.                                    points, in a certain risk model, It is also useful to preserve the
                                                         more descriptive condition (high, me4 low, etc.).
              IV.  Data preparation
                                                         Point events and continuous data
              Data collection and format
                                                         There is a distinction between data representing a specific point
              Pertinent risk data will come from a variety of sources. Older   versus data representing a continuous condition over a length of
              data will be in paper form and will probably need to be put into   pipeline. Continuous data always have a beginning and ending
              electronic format. It is not uncommon to find many different   station number. A condition that stays generally constant over
              identification systems, with some linked to original alignment   longer  distances  is clearly continuous  data. Point event data
              sheets.  some  based  on  linear  measurements  from  fixed   have a beginning station number but no ending station-that  is,
   197   198   199   200   201   202   203   204   205   206   207