Page 28 - Data Architecture

P. 28

Chapter 1.1: An Introduction to Data Architecture

The Great Divide of Data

It is not obvious at all, but the dividing line in unstructured data between unstructured

repetitive data and unstructured nonrepetitive data is very significant. In fact, the dividing
line between unstructured repetitive data and unstructured nonrepetitive data is so
important that the division can be called the “great divide” of data.

Fig. 1.1.4 shows the great divide of data.

Fig. 1.1.4 The great divide.

It is hardly obvious why there should be this great divide of data. But there are some very
good reasons for the divide:

Repetitive data usually have very limited business value, while nonrepetitive data are rich in business
value.
Repetitive data can be handled one way; nonrepetitive data are handled very differently.
Repetitive data can be analyzed one way, while nonrepetitive data can be analyzed in a very different
manner.
And so forth.

The two worlds—of repetitive data and of nonrepetitive data—are as different as chalk
and cheese. Tools and techniques that work in one world simply are not applicable to the

other world and vice versa.

In many ways, the great divide of data is as profound as the continental divide. In the

continental divide, snow that falls on one side of the divide ends up as water that flows to
the Pacific Ocean, whereas snow that falls on the other side of the divide ends up heading
28

23 24 25 26 27 28 29 30 31 32 33