Page 370 - Data Architecture
P. 370

Chapter 9.2: Analyzing Repetitive Data
           Chapter 9.2



           Analyzing Repetitive Data



           Abstract



           There are many facets to the analysis of repetitive data. One type of data where
           repetitive data are found is in an open-ended continuous system. Another place where
           repetitive analytics is done is in a project-based environment. A common practice for
           analytics in repetitive analytics is that of looking for patterns. One issue that always

           occurs with repetitive pattern analysis is the occurrence of false positives. A useful
           approach for doing repetitive analytics is to create what is known as the “sandbox.”
           Analysis in the sandbox does not go outside of the corporation. On the other hand, the
           analyst is not constrained with regard to the analysis that is done or what data can be
           analyzed. Log tapes often provide a basis for repetitive data analytics.


           Keywords



           Repetitive data; Open-ended continuous system; Project-based system; Pattern analysis;
           Outliers; False positives; The “sandbox”; Log tapes


           Much of the data found in big data are repetitive. Analyzing repetitive data in the big data
           environment is quite different than analyzing data in the nonrepetitive environment. As a
           point of departure, we need to look at what the repetitive big data environment looks like.


           Fig. 9.2.1 shows that data in the repetitive big data environment look like lots of units of
           data laid end to end.






















                                                                                                               370
   365   366   367   368   369   370   371   372   373   374   375