Page 372 - Data Architecture
P. 372

Chapter 9.2: Analyzing Repetitive Data
           A block of data is a large allocation of space. The system knows how to find a block of
           data. The block of data is loaded with units of data. These units of data can be thought of
           as records. Within the record of data are attributes of data.


           As an example of the organization of data, consider the record of telephone calls. In the
           block of data is found the information about many phone calls. In the record for each
           phone call is found some basic information:


               Date and time of the phone call
               Who was making the phone call
               To whom the call was made
               How long the telephone call was made


           There may be other incidental information such as was the phone call operator assisted or
           was the phone call an international phone call. But at the end of the day, the same
           attribution of information is found over and over again, for every phone call.


           When the system goes to look for data, the system knows how to find a block of data.
           But once the system finds a block of data, it is up to the analyst to make sense of the data
           found in the block. The analyst does this by “parsing” the data. The analyst reads the
           data in the block. Then, the analyst determines where a record is. Upon finding a record,
           the analyst then determines what attribute is where.


           The process of parsing would be onerous if there were not a high degree of similarity of
           the records tucked into the block.


           Fig. 9.2.3 shows that upon encountering a block of data in big data, there is a need to
           parse the block.


























                                                                                                               372
   367   368   369   370   371   372   373   374   375   376   377