Page 259 - Data Architecture

P. 259

Chapter 6.5: Introduction to Data Vault Implementation
order to accomplish the task?

Chances are the answer is no. The team must redesign, reengineer, and rearchitect the
process design in order to accomplish the task in the allotted time frame. So, the redesign
is complete. The team now deploys an ETL tool and introduces logic to loading the data
set.

Scenario #3: One billion rows of data, arriving every 45 minutes, highly structured. The
requirement is to load the data warehouse in a 40-minute time frame (otherwise the
queue of incoming data backs up). The question again is as follows: can the team use the

same “process design” they just applied, in order to accomplish this task? Can the team
execute without redesign?

Again, most likely the answer is no. The team must once again redesign the process
because it doesn’t meet the service level agreement (requirements). This type of redesign
occurs again and again until the team reaches a CMMI level 5 optimized state for the
pattern.

The problem is that any significant change to any of the axis’ on the pyramid causes a
redesign to occur. The only solution is to mathematically find the right solution, the
correct design that will scale regardless of time, volume, velocity, or variety.
Unfortunately, this leads to unsustainable systems that try to deal (unsuccessfully) with
big data problems.

The Data Vault 2.0 implementation standards hand these designs to the BI solution team,
regardless of the technology underneath. The implementations or patterns applied to the
designs for dealing with the data sets scale. They are based on mathematical principles of
scale and simplicity, including some of the foundations of set logic, parallelism, and
partitioning.

Teams that engage with the Data Vault 2.0 implementation best practices inherit the
designs, as an artifact for big data systems. By leveraging these patterns, the team no
longer suffers from rearchitecture or redesigns just because one or more of the

axis/parameters change.

Why Do We Need to Virtualize Our Data Marts?

They should no longer be called data marts—they provide information to the business—
therefore, they should be called information marts. There is a split between data,
259

254 255 256 257 258 259 260 261 262 263 264