Page 316 - Data Architecture
P. 316

Chapter 8.2: Big Data/Existing System Interface
           Fig. 8.2.1 shows the overall system flow between big data and the existing system
           environment.


           Each of the interfaces will be discussed in detail.


           Raw big data is divided into two distinct sections (see the “great divide”). There is
           repetitive raw big data and nonrepetitive raw big data. Repetitive raw big data is handled
           entirely differently than nonrepetitive raw big data.



           The Repetitive Raw Big Data/Existing Systems Interface



           The interface from repetitive raw big data to existing system environment in some ways is
           the simplest interface. In many ways, this interface is like a distillation process. The mass
           of data found in raw repetitive big data is winnowed down—distilled—into the few
           records that are of interest.


           The repetitive raw big data is processed by parsing each record. And when the records
           that are of interest are located, the records of interest are then edited and passed to the
           existing system environment. In such a fashion, the data that are of interest are distilled
           from the mass of records found in the raw repetitive big data environment. One

           assumption made by this interface is that the vast majority of records found in the
           repetitive component of raw big data will not be passed to the existing system
           environment. The assumption is that only a few records of interest are to be found.


           In order to explain this assumption, consider a few cases.


           Manufacturing—a manufacturer makes a product. The quality of the product is quite
           high. On the average, only one out of 10,000 products is defective. However, the
           defective products are still a bother. All the product manufacturing information is stored
           in big data. But only the information about the defective products is brought to the
           existing systems environment for further analysis. In this case, based on a percentage
           basis, very little data are brought to the existing system environment.


           Telephone calls (call record details)—on a daily basis, millions of telephone calls are
           made. But of those millions of telephone calls, only a handful—maybe three or four—are
           of interest. Only the phone calls that are of interest are brought from the big data
           environment to the existing system environment


           Log tape analysis—a log tape of transactions is created. In a day, tens of thousands of log
                                                                                                               316
   311   312   313   314   315   316   317   318   319   320   321