Page 230 - Mechanical Engineers' Handbook (Volume 2)

P. 230

4 Data Conditioning 219

repeated. There are many techniques for compressing data, covered elsewhere. Zip
ﬁles are a common instance of compression to make the data consume less storage
and to take less time in transmission from one computer to another. Compression
tends to increase the storage and retrieval time slightly. Increasingly, ﬁle systems
associated with common operating systems include compression as a standard option
or feature of mass storage. These systems are quite good at compressing repeated data
but are less effective when data vary but have a mathematical relationship, such as a
straight line between two points.
• Normalize the data. When the developer knows relationships between data, redun-
dancy can be avoiding by normalizing the data—following some basic principles to
organize the data in such a way that redundancy is avoided. C. J. Date describes the
5
levels of normalization of data in a relational database. For instance, if a person has
5
several addresses, then one could store the person’s name once, store each address,
and store the links from the person to the address. While very similar to compression,
it relies on the developer identifying and taking advantage of the relationships between
data to eliminate redundancy and reduce space. This creates signiﬁcant effort in plan-
ning for acquisition and storage of data. It pays off in reduced storage and signiﬁcantly
improved retrieval and analysis times.
• Eliminate nonessential data. If one is not interested in the shape of a sinusoidal signal,
for instance, but only interested in how many cycles occurred during a given time
frame, then sampling techniques can be used to characterize the data without having
to store signiﬁcant data.

The engineer or researcher has to make assumptions about how the data will be used
and factor those into the acquisition and storage system. A project attempting to discover
the relationships between waveforms would require high-frequency sampling and probably
time-based storage. A project attempting to record the number of times a boiler went over
a certain temperature level might have a high-speed scanning capability but only store those
values that were above the temperature limit. An inventory tracking system may have triggers
that cause scanning only when some event occurs.
Often, a batch or pallet of product may contain a large number of items. The items can
be sampled for some process attribute. The customer may want to know summary statistics
about the pallet, but the storage of all the data may not be feasible. In this case, statistical
results can normally be derived from summary data. Average, standard deviation, total, cor-
relation, maximum, and minimum are easily calculated from summary, accumulated data*:

Averages. Keep a running sum of data and count of readings:

Average Sum Of Values/Count Of Values

Totals. Keep a running sum of data.
Standard Deviation

x (
n n i 1 n(n 1) n i 1 x ) 2
2
i
i

*Standard deviation and correlation from Ref. 6, pp. 473, 477.

225 226 227 228 229 230 231 232 233 234 235