Page 227 - Data Architecture
P. 227

Chapter 6.2: Introduction to Data Vault Modeling
           The link in this case carries the key matches (from and to). This type of link structure can
           be utilized to connect master key selections or to explain the key mapping/changing from
           one source system to another. It can also be utilized to represent multilevel hierarchies
           (not shown here).


           Note that importing the Excel spreadsheet shows the first step toward managed self-
           service BI (managed SSBI). Managed SSBI is the next step in the evolution of data

           warehousing. Allowing the business users to interact with the raw data sets in the
           warehouse and affect their own information marts by changing the data.


           The data vault model not only provides immediate business value but also is capable of
           tracking all relationships over time. It demonstrates the different hierarchies of data (even
           though this is highly focused on two particular business keys at the moment) that are
           possible when loading into the warehouse.


           By tracking the changes to business keys that exposes the relationship across and
           between business keys, the business can then begin to ask and answer the following
           questions:


               • How long does my customer account stay in sales before it is passed to procurement?
               • Can I compare an AS-SOLD image with an AS-CONTRACTED image and an AS-
               MANUFACTURED image with an AS-FINANCED image?
               • How many customers do I actually have?
               • How long does it take for a customer/product/service to make it from initial sale to final delivery in
               my business?


           Many of these questions cannot be answered without a consistent business key that spans
           the different lines of business.



           Why Restructure the Data From the Staging Area?



           Restructuring allows integration across multiple systems into a single place in the target
           data warehouse without changing the data set itself (i.e., no conformity). This is called
           passive integration. Data are considered passively integrated by business key because
           there is no change to the raw data. It is integrated according to the location (i.e., all
           individual customer account numbers will exist in the same hub, while all corporate
           customer account numbers exist in a different hub).


           In the age of big data, staging areas are also known as landing zones, data dumps, or data
           junkyards. Staging areas are a logical concept that can manifest themselves physically in
                                                                                                               227
   222   223   224   225   226   227   228   229   230   231   232