Page 94 - Data Architecture
P. 94

Chapter 2.1: The End-State Architecture—The “World Map”
           voice to text transcription. Written text—if it is not already in the form of electronic text
           —can be captured and transformed by optical character recognition, OCR. However, the
           text exists; it is prepared into the form of electronic text.


           Transaction data are data that have been captured as the by-product of the execution of a
           transaction. There are many kinds of transactions. There are bank teller transactions,
           ATM transactions, airline reservations, retail purchases, credit card activity, inventory

           management transactions, payment ledger transactions, and many more. These
           transactions are usually run by applications. As a rule, applications are developed and
           built in a “siloed” fashion. This means that when one application is built, it does not take
           into consideration the other applications with which it must interact. Corporations end up
           with a whole collection of applications, each one of which acts independently. The result
           is unintegrated application data.


           Corporate data are data that have entered the system and then have been transformed
           into an integrated corporate state. The transformation moves the data from being
           application-oriented data to a data warehouse where the data are integrated into a
           corporate state. As a simple example of corporate integration, application A has gender
           as male/female, application B has gender designated as x/y, and application C has gender
           designated as 1/0. The corporate standard for the designation of gender is m/f. The
           application data are converted as they were moved into the data warehouse from the

           application.

           The data marts contain data that are customized for the different groups that will be

           analytically using the data. Typically, there are data marts for marketing, sales, finance,
           and others. The source of data for the data marts is the data warehouse.


           The data lake contains a variety of data. Some of the data found in the data lake are
           archival data. Other data in the data lake are simply bulk data. And it is possible to build
           a bulk data warehouse in the data lake. In addition, the bulk data warehouse may contain
           a bulk data vault. The bulk data warehouse is the single version of the truth for bulk
           amounts of data.


           The data ponds are the subsets of the data lake that are set aside for different purposes.
           There may be an archival data pond, a litigation support data pond, a general purpose
           data pond, a manufacturing data pond, an analog data pond, and so forth.



           Shaping the Data Through Models


                                                                                                                94
   89   90   91   92   93   94   95   96   97   98   99