Page 354 - Data Architecture
P. 354

Chapter 9.1: Repetitive Analytics: Some Basics
































               Fig. 9.1.10 Distillation and filtering.


           Subsetting Data



           One of the results of filtering is the creation of subsets of data. As repetitive data are read
           and filtered, the result is the creation of data into different subsets. There are lots of
           practical reasons of subsetting data. Some of those reasons are the following:


               - The reduction in volume of data that have to be analyzed. It is much easier to analyze and manipulate a
               small subset of data than it is to analyze that same data mixed in with many other nonrelevant
               occurrences of data.
               - Purity of processing. By subsetting data, the analyst can filter out unwanted data, so that the analysis
               can focus on the data that are of interest. Creating a subset of data means that the analytic algorithmic
               processing that occurs can be very focused on the objective of the analysis.
               - Security. Once data are selected into a subset, it can be protected with even higher levels of security
               than when the data existed in an unfiltered state.


           Subsetting data for analysis is a technique that is used commonly and has been used as
           long as there were data and a computer.


           One of the uses of subsetting of data is to set the stage for sampling.


           In data sampling, processing goes against a sample of data rather than against the full set
           of data. In doing so, the resources used for creating the analysis are considerably less, and
           the time that it takes to create the analysis is significantly reduced. And in heuristic
                                                                                                               354
   349   350   351   352   353   354   355   356   357   358   359