Page 166 - Building Big Data Applications
P. 166
Chapter 9 Governance 165
Fig. 9.4 shows the detailed processing of data across the different stages from source
systems to the data warehouse and downstream systems. When implemented with
metadata and master data integration the stages become self-contained and we can
manage the complexities of each stage within that stage’s scope of processing, as dis-
cussed next:
Acquire stagedIn this stage of data processing, we simply collect data from multi-
ple sources and this acquisition process can be implemented as direct extract from
a database to data being sent as flat files or simply available as a web service for
extraction and processing.
Metadata at this stage will include the control file (if provided), the extract file
name, size, and source system identification. All of this data can be collected as
a part of the audit process.
Master data at this stage has no role as it relates more to the content of the
data extracts in the processing stage.
Process StagedIn this stage of processing the data transformation and standardiza-
tion including applying data quality rules is completed and the data is prepared for
the loading into the data warehouse or data mart or analytical database. In this ex-
ercise both metadata and master data play very key roles.
Metadata is used in the data structures, rules, and data quality processing.
Master data is used for processing and standardizing the key business entities.
Metadata is used to process audit data.
FIGURE 9.4 Data processing cycles with integration of MDM and metadata.