Page 168 - Building Big Data Applications
P. 168
Chapter 9 Governance 167
and video, and converted documents from audio conversations all have context
and without appropriate contextualization the associated hierarchy cannot be pro-
cessed. Incorrect hierarchy attribution will result in datasets that may not be
relevant.
Lack of metadatadthere is no metadata within the documents or files containing
big data. While this is not unusual, it poses challenges when attributing the meta-
data to the data during processing. The use of taxonomies and semantic libraries
will be useful in flagging the data and subsequently processing it.
Processing limitations
Write Once Modeldwith big data there is no update processing logic due to the
intrinsic nature of the data that is being processed. Data with changes will be pro-
cessed as new data.
Data fracturingddue to the intrinsic storage design, data can be fractured across
the big data infrastructure. Processing logic needs to understand the appropriate
metadata schema used in loading the data. If this match is missed then errors
could creep into processing the data.
Big data processing can have combinations of these limitations and complexities,
which will need to be accommodated in the processing of the data. The next section
discusses the steps in processing big data.
Governance model for building an application
The subject is very easy to state but extremely complex when discussed in layers, we will
look at the same. Fig. 9.5 is a description of the layers of governance related to the build
of applications.
As seen in Fig. 9.5, there are several layers of governance applied in big data applications.
The first layer is the data layer, which covers the data management of big data that will be
used in the entire exercise, this includes data acquisition, data discovery, daya analysis, data
taging. Metadata processing, master data integration, and data delivery to data lakes and
data hubs. The complexity of governance is these layers is very essential to understand, the
reason being that we will have data discovery, exploration, and analysis being done by many
business users across different teams and their outcomes, logs, and analysis need to be
sorted, processed, and managed efficiently. We will be delivering these processes and ser-
vices both as microservices and as robotic process automation exercises, these need to be
documented, captured, and stored for easy access and use, the functionality of this exercise
is also falling under governance.