Page 167 - Building Big Data Applications
P. 167
166 Building Big Data Applications
In this processing stage of data movement and management, metadata is very
essential to ensure auditability and traceability of data and process.
Storage Stagedin this process the data transformed to final storage at rest is
loaded to the data structures. Metadata can be useful in creating agile processes to
load and store data in a scalable and flexible architecture.
Metadata used in this stage includes loading process, data structures, audit pro-
cess, and exception processing.
Distribution Stagedin this stage, data is extracted or processed for use in down-
stream systems. Metadata is very useful in determining the different extract pro-
grams, the interfaces between the data warehouse or data mart and the
downstream applications and auditing data usage and user activity.
In a very efficiently designed system as described in Fig. 11.4 we can create an
extremely scalable and powerful data processing architecture based on metadata and
master data. The challenge in this situation is the processing complexity and how the
architecture and design of the data management platform can be compartmentalized to
isolate the complexities to each stage within its own layer of integration. Modern data
architecture design will create the need for this approach to process and manage the
lifecycle of data in any organization.
We have discussed the use of metadata and master data in creating an extremely agile
and scalable solution for processing data with applications in the modern data ware-
house. The next section will focus on implementing governance and leveraging benefits
from the same.
Processing complexity of big data
The most complicated step in processing big data lies with the not just the volume or
velocity of the data but also its
Variety of formatsddata can be presented for processing as excel spreadsheets,
word documents, pdf files, OCR data, email, data from content management plat-
forms, data from legacy applications, and data from web applications. Sometimes
it may be variations of the same data over many time periods where the metadata
changed significantly.
Ambiguity of datadcan arise from simple issues like naming conventions to similar
column names of different types of data to same column storing different types of
data. A lack of metadata and taxonomies can create a significant delay in process-
ing this data.
Abstracted layers of hierarchydthe most complex area in big data processing are
the hidden layers of hierarchy. Data contained in textual, semistructured, image