Page 230 - Mechanical Engineers' Handbook (Volume 2)
P. 230

4 Data Conditioning  219

                                repeated. There are many techniques for compressing data, covered elsewhere. Zip
                                files are a common instance of compression to make the data consume less storage
                                and to take less time in transmission from one computer to another. Compression
                                tends to increase the storage and retrieval time slightly. Increasingly, file systems
                                associated with common operating systems include compression as a standard option
                                or feature of mass storage. These systems are quite good at compressing repeated data
                                but are less effective when data vary but have a mathematical relationship, such as a
                                straight line between two points.
                              • Normalize the data. When the developer knows relationships between data, redun-
                                dancy can be avoiding by normalizing the data—following some basic principles to
                                organize the data in such a way that redundancy is avoided. C. J. Date describes the
                                                                               5
                                levels of normalization of data in a relational database. For instance, if a person has
                                                                           5
                                several addresses, then one could store the person’s name once, store each address,
                                and store the links from the person to the address. While very similar to compression,
                                it relies on the developer identifying and taking advantage of the relationships between
                                data to eliminate redundancy and reduce space. This creates significant effort in plan-
                                ning for acquisition and storage of data. It pays off in reduced storage and significantly
                                improved retrieval and analysis times.
                              • Eliminate nonessential data. If one is not interested in the shape of a sinusoidal signal,
                                for instance, but only interested in how many cycles occurred during a given time
                                frame, then sampling techniques can be used to characterize the data without having
                                to store significant data.


                              The engineer or researcher has to make assumptions about how the data will be used
                           and factor those into the acquisition and storage system. A project attempting to discover
                           the relationships between waveforms would require high-frequency sampling and probably
                           time-based storage. A project attempting to record the number of times a boiler went over
                           a certain temperature level might have a high-speed scanning capability but only store those
                           values that were above the temperature limit. An inventory tracking system may have triggers
                           that cause scanning only when some event occurs.
                              Often, a batch or pallet of product may contain a large number of items. The items can
                           be sampled for some process attribute. The customer may want to know summary statistics
                           about the pallet, but the storage of all the data may not be feasible. In this case, statistical
                           results can normally be derived from summary data. Average, standard deviation, total, cor-
                           relation, maximum, and minimum are easily calculated from summary, accumulated data*:

                              Averages. Keep a running sum of data and count of readings:

                                                Average   Sum Of Values/Count Of Values

                              Totals. Keep a running sum of data.
                              Standard Deviation

                                                               x   (
                                                          n   n i 1 n(n   1)  n i 1  x ) 2
                                                                2
                                                                i
                                                                         i


                           *Standard deviation and correlation from Ref. 6, pp. 473, 477.
   225   226   227   228   229   230   231   232   233   234   235