Page 231 - Mechanical Engineers' Handbook (Volume 2)
P. 231
220 Data Acquisition and Display Systems
Correlation
n xy ( x )( y )
r ii i i (3)
2
2
2
2
[n x ( x )][n y ( y )]
i i i i
Range. Save largest and smallest values.
Median. Find the middle value of a distribution, which requires keeping all values.
The median (the true center of the data) requires the raw data to be calculated. A
compromise for depicting the distribution of data without having to store the full details is
to store a distribution of the data. For instance, the range of possible important data can be
broken into a series of totals, reflecting the count of items that fit into the particular total.
A histogram representing the distribution of data can be created from the totals without
requiring the full set of original data. In addition, the median can be approximated using
this technique. The distribution can also be used to supply data for statistics based on dis-
tribution of data, such as the Taguchi loss function (Ref. 7, pp. 397–400).
4.5 More on Sampling and Compression
Rather than just sample the data, why not save all the changed values of the data, discarding
values which are the same or within some limits of the previous reading? This really applies
best to continuous processes. Quite significant space reduction can be maintained in proc-
esses that are slowly changing and have only occasional large upsets. Variations of this
technique can provide additional improvements. For instance, rather than just checking to
see if the current value is the same as or within some limits from the previous reading, see
if it is on the same line or curve as the previous value. This can result in a great reduction
of storage requirements at the loss of a slight amount of accuracy in reconstruction. The
more flexible the compression technique, the more work must be done to reconstruct the
data later for examination. For instance, if the user of the data acquisition system wants to
retrieve a data point within data that has been reduced to a line segment, the user or the
system must determine which line segment is wanted using the time stamp for the beginning
and ending of the line segment interval and then recalculate the point from the equation.
This is referred to as a boxcar algorithm. Values that are close to the line segment can be
8
treated as on the line segment if one can afford to lose some accuracy. The formula for a
boxcar has to take into account the length of the interval (maximum), the height of the box
(how much noise is allowed), and how peak or exception values are treated. For instance,
Table 2 presents a set of data with several types of compression applied for data sampled at
a constant interval.
In Table 2, the simple repeating-value compression will not lose data but will result in
little or no compression if the data are changing value frequently, including having any noise.
The boxcar compression technique results in much higher compression for slowly changing
data with only the loss of fine detail data (depending on the height of the window). For data
that are nonlinear or changing frequently, the boxcar compression method results in little
compression. Process information systems often use the boxcar compression method. If data
are slow moving with occasional bursts of activity, the boxcar and repeating-value methods
can result in dramatic reductions in space required. If data changes tend to be linear, then
the boxcar algorithm tends to be superior to the repeated-value approach. For an extreme
example, see Table 3.
The raw data would have resulted in 631 data points being stored. The boxcar method
would result in 5 data points being stored, less than 1% of the storage required. In the