Page 28 - Data Architecture
P. 28

Chapter 1.1: An Introduction to Data Architecture

           The Great Divide of Data


           It is not obvious at all, but the dividing line in unstructured data between unstructured

           repetitive data and unstructured nonrepetitive data is very significant. In fact, the dividing
           line between unstructured repetitive data and unstructured nonrepetitive data is so
           important that the division can be called the “great divide” of data.


           Fig. 1.1.4 shows the great divide of data.


























               Fig. 1.1.4 The great divide.


           It is hardly obvious why there should be this great divide of data. But there are some very
           good reasons for the divide:


               Repetitive data usually have very limited business value, while nonrepetitive data are rich in business
               value.
               Repetitive data can be handled one way; nonrepetitive data are handled very differently.
               Repetitive data can be analyzed one way, while nonrepetitive data can be analyzed in a very different
               manner.
               And so forth.

           The two worlds—of repetitive data and of nonrepetitive data—are as different as chalk
           and cheese. Tools and techniques that work in one world simply are not applicable to the

           other world and vice versa.

           In many ways, the great divide of data is as profound as the continental divide. In the

           continental divide, snow that falls on one side of the divide ends up as water that flows to
           the Pacific Ocean, whereas snow that falls on the other side of the divide ends up heading
                                                                                                                28
   23   24   25   26   27   28   29   30   31   32   33