Page 228 - Data Architecture
P. 228

Chapter 6.2: Introduction to Data Vault Modeling
           multiple environments. A “staging area” may be a file store on Amazon S3 or an Azure
           Cloud, or it may be a Hadoop distributed file system (HDFS). It may also be a relational
           database table structure. Staging areas focus the data in a single concept in preparation
           for moving the data downstream.



           What Are the Basic Rules of the Data Vault Model?



           There are some fundamental rules in data vault modeling that must be followed, or the
           model itself no longer qualifies to be a data vault model. These rules are documented in a
           classroom environment in full. However, some of the rules are listed below:


            1.  (1) Business keys are separated by GRAIN and semantic meaning. That means customer corporation
               and customer individual must exist or be recorded in two separate hub structures.
            2.  (2) Relationships, events, and intersections across two or more business keys are placed into link
               structures.
            3.  (3) Link structures have no begin or end dates; they are merely an expression of the relationship at the
               time the data arrived in the warehouse.
            4.  (4) Satellites are separated by the type of data/classification and rate of change. Type of data is
               typically a single source system.


           Raw data vault modeling does not allow nor provide for such concepts or notions as
           conformity, nor does it deal with super types. Those concepts lie within the business vault
           models (another form of data vault modeling that is used as an information delivery
           layer).



           Why Do We Need Many to Many Link Structures?



           Many-to-many link structures allow the data vault model to be future proof/extendable.
           The relationships expressed in source systems are often a reflection of business rules or
           business execution today. The relationship definition has changed over time and will
           continue to change. To represent both historical and future data (without reengineering
           the model and the load routines), many-to-many relationship tables are necessary.


           This is how the Data Vault 2.0 data warehouse can expose the patterns of relationship
           changes over time answering questions like where is the gap between “current
           requirements” and “relationships” in history? The many-to-many table (link) in the raw
           data vault provides metrics around what percentage of data are “broken” and when that

           data break the relationship requirement.


                                                                                                               228
   223   224   225   226   227   228   229   230   231   232   233