Page 372 - Data Architecture

P. 372

Chapter 9.2: Analyzing Repetitive Data
A block of data is a large allocation of space. The system knows how to find a block of
data. The block of data is loaded with units of data. These units of data can be thought of
as records. Within the record of data are attributes of data.

As an example of the organization of data, consider the record of telephone calls. In the
block of data is found the information about many phone calls. In the record for each
phone call is found some basic information:

Date and time of the phone call
Who was making the phone call
To whom the call was made
How long the telephone call was made

There may be other incidental information such as was the phone call operator assisted or
was the phone call an international phone call. But at the end of the day, the same
attribution of information is found over and over again, for every phone call.

When the system goes to look for data, the system knows how to find a block of data.
But once the system finds a block of data, it is up to the analyst to make sense of the data
found in the block. The analyst does this by “parsing” the data. The analyst reads the
data in the block. Then, the analyst determines where a record is. Upon finding a record,
the analyst then determines what attribute is where.

The process of parsing would be onerous if there were not a high degree of similarity of
the records tucked into the block.

Fig. 9.2.3 shows that upon encountering a block of data in big data, there is a need to
parse the block.

372

367 368 369 370 371 372 373 374 375 376 377