Page 259 - Data Architecture
P. 259

Chapter 6.5: Introduction to Data Vault Implementation
           order to accomplish the task?


           Chances are the answer is no. The team must redesign, reengineer, and rearchitect the
           process design in order to accomplish the task in the allotted time frame. So, the redesign
           is complete. The team now deploys an ETL tool and introduces logic to loading the data
           set.


           Scenario #3: One billion rows of data, arriving every 45 minutes, highly structured. The
           requirement is to load the data warehouse in a 40-minute time frame (otherwise the
           queue of incoming data backs up). The question again is as follows: can the team use the

           same “process design” they just applied, in order to accomplish this task? Can the team
           execute without redesign?


           Again, most likely the answer is no. The team must once again redesign the process
           because it doesn’t meet the service level agreement (requirements). This type of redesign
           occurs again and again until the team reaches a CMMI level 5 optimized state for the
           pattern.


           The problem is that any significant change to any of the axis’ on the pyramid causes a
           redesign to occur. The only solution is to mathematically find the right solution, the
           correct design that will scale regardless of time, volume, velocity, or variety.
           Unfortunately, this leads to unsustainable systems that try to deal (unsuccessfully) with
           big data problems.


           The Data Vault 2.0 implementation standards hand these designs to the BI solution team,
           regardless of the technology underneath. The implementations or patterns applied to the
           designs for dealing with the data sets scale. They are based on mathematical principles of
           scale and simplicity, including some of the foundations of set logic, parallelism, and
           partitioning.


           Teams that engage with the Data Vault 2.0 implementation best practices inherit the
           designs, as an artifact for big data systems. By leveraging these patterns, the team no
           longer suffers from rearchitecture or redesigns just because one or more of the

           axis/parameters change.


           Why Do We Need to Virtualize Our Data Marts?



           They should no longer be called data marts—they provide information to the business—
           therefore, they should be called information marts. There is a split between data,
                                                                                                               259
   254   255   256   257   258   259   260   261   262   263   264