Page 235 - Data Architecture
P. 235

Chapter 6.2: Introduction to Data Vault Modeling
           that a hub is defined as a unique list of business keys. The preference is to use natural
           business keys that have meaning to the business.


           One of the functions of a properly built raw Data Vault 2.0 model is to provide
           traceability across the lines of business. To do this, the business keys must be stored in
           the hub structures according to a set of design standards.


           Most of the business keys in the source system today are surrogate sequence numbers
           defined by the source application. The world is full of these “dumb” machine-generated
           numeric values. Examples include customer number, account number, invoice number,

           and order number, and the list goes on.


           Source System Sequence Business Keys


           Source system sequence-driven business keys make up 98% of the source data that any
           data warehouse or analytic system receives. Even down to transaction ID, e-mail ID, or
           some of the unstructured data sets, such as document ID, contain surrogates. The theory

           is that these sequences should never change and should represent the same data once
           established and assigned.


           That said, the largest problem that exists in the operational systems is one the analytic
           solution is always asked to solve, that is, how to integrate (or master) the data set, to
           combine it across business processes and make sense of the data that have been assigned
           multiple sequence business keys throughout the business life cycle.


           An example of this may be customer account. Customer account in SAP may mean the
           same thing as customer account in Oracle Financials or some other customer relationship
           management (CRM) or enterprise resource planning (ERP) solution. Generally, when the
           data are passed from SAP to Oracle Financials, typically, the receiving OLTP application
           assigns a new “business key” or surrogate sequence ID. It's still the same customer
           account; however, the same representative data set now has a new key.


           The issue becomes as follows: how do you put the records back together again? This is a
           master data management (MDM) question and with an MDM solution in place (including
           good governance and good people) can be solved and approximated with deep learning

           and neural networks. Even statistical analysis of “similar attributes” can detect within a
           margin of error the multiple records that “should” be the same but contain different keys.


           This business problem perpetuates into the data warehouse and analytic solution typically
                                                                                                               235
   230   231   232   233   234   235   236   237   238   239   240