Page 202 - Pipeline Risk Management Manual Ideas, Techniques, and Resources
P. 202
Data preparation 8/179
Several industry database design standards are emerging as points, and some based on coordinate systems, such as the
of this writing. Adhering to a standard model facilitates the Global Positioning System (GPS). Alignment sheets normally
efficient exchange of data with vendors and service providers use stationing equations to capture adjustments and changes in
(ILI, CIS, etc), as well as other pipeline companies and govern- the pipeline route. These equations often complicate identifiers
mental databases. since a stationing shown on an alignment sheet will often be
Each event must have a condition assigned. Some conditions inconsistent with the linear measurements taken in most sur-
can be assigned as general defaults or as a system-wide charac- veys. Information will need to be in a standard format or trans-
teristic. Each event-condition combination defines a risk char- lation routines can be used to switch between alignment sheet
acteristic for aportion of the system. stationing and linear measurements.
A restricted vocabulary is enforced in the most robust All input information should be collected in a standard data
software applications. Only predefined terms can be used format with common field (column) names. A standard data
to characterize events. This eliminates typos and the use of format can be specified for collection or reformatting.
different conditions to mean the same thing. For instance, for Consider this example:
the eventpipe manufacturer = “Republic Steel Corp,” and not
“Republic” or “Republic Steel” or “republic” or “RSC”; coat- ID Begstation Endstation Desc Code notes
ing condition = “fair” and not “F” or “ok,” “medium” or “med,”
etc. where
The data dictionary is a document that lists all events and ID =identifier relating to a specific length ofpipeline
their underlying source, as well as all risk variables. It should Begstation =the beginning point for a specific event and con-
also show all conditions used for each event along with the full dition, using a consistent distance measuring sys-
description of each condition and its corresponding point val- tem
ues. The data dictionary is designed to be a reference and con- Endstation =the end point for a specific event and condition,
trol document for the risk assessment. It should specify the using the same measurement system.
owner (the person responsible for the data) as well as update Desc = the name of the event
frequency, accuracy, and other pertinent information about Code =the condition.
each piece of data, sometimes called meta data.
In common database terminology, each row of data is called Each record in the initial events database therefore
a record and each column is called afield. So, each record is corresponds to an event that reports a condition for some risks
composed of several fields of information and each field con- variable for a specific distance along a specific pipeline.
tains information related to each record. A collection of records In data collection and compilation, an evaluator may wish to
and fields can be called a database, a data set, or a table. keep separate data sets-perhaps a different data set for each
Information will usually be collected and put into a database (a event or each event in each operating area-for ease of editing
spreadsheet can be a type of database). Results of risk assess- and maintenance during the data collection process. The num-
ments will also normally be put into a database environment. ber of separate data sets that are created to contain all the infor-
GIs is a geographical information system that combines mation is largely a matter of preference. Having few data sets
database capabilities with graphics (especially maps) capabili- makes tracking of each easier, but makes each one rather large
ties. GIS is increasingly the software environment of choice for and slow to process and also may make it more difficult to find
assets that span large geographic areas. Most GIS environ- specific pieces of information. Having many data sets means
ments have a programming language that can extract data and each is smaller and quicker to process and contains only a few
combine them according to the rules of an algorithm. Common information types. However, managing many smaller data sets
applications for more detailed risk assessments will be model- may be more problematic. Especially in cases where the num-
ing for flowpath or dispersion distances and directions, surface ber of event records is not huge, maintaining separate data sets
flow resistance, soil penetration, and hazard zone calculations. might not be beneficial.
It can also be the calculating “engine” for producing risk scores. Separate data sets will need to be combined for purposes of
SQL refers to Structured Query Language, a software lan- segmentation and assignment of risk scores. The combining of
guage recognized by most database software. Using SQL, a data sets can be done efficiently through the use of certain
query can be created to extract certain information from the queries in the SQL ofmost common database software.
database or to combine or present information in a certain way. A scoring assessment requires the assignment of a numerical
Therefore, SQL can take individual pieces of data from the value corresponding to each condition. For example, the event
database and apply the rules of the algorithm to generate risk environ sensitivity is scored as “High’ which equals a value of 3
scores. points, in a certain risk model, It is also useful to preserve the
more descriptive condition (high, me4 low, etc.).
IV. Data preparation
Point events and continuous data
Data collection and format
There is a distinction between data representing a specific point
Pertinent risk data will come from a variety of sources. Older versus data representing a continuous condition over a length of
data will be in paper form and will probably need to be put into pipeline. Continuous data always have a beginning and ending
electronic format. It is not uncommon to find many different station number. A condition that stays generally constant over
identification systems, with some linked to original alignment longer distances is clearly continuous data. Point event data
sheets. some based on linear measurements from fixed have a beginning station number but no ending station-that is,