Page 168 - Building Big Data Applications
P. 168

Chapter 9   Governance 167


                   and video, and converted documents from audio conversations all have context
                   and without appropriate contextualization the associated hierarchy cannot be pro-
                   cessed. Incorrect hierarchy attribution will result in datasets that may not be
                   relevant.
                   Lack of metadatadthere is no metadata within the documents or files containing
                   big data. While this is not unusual, it poses challenges when attributing the meta-
                   data to the data during processing. The use of taxonomies and semantic libraries
                   will be useful in flagging the data and subsequently processing it.

                 Processing limitations


                   Write Once Modeldwith big data there is no update processing logic due to the
                   intrinsic nature of the data that is being processed. Data with changes will be pro-
                   cessed as new data.
                   Data fracturingddue to the intrinsic storage design, data can be fractured across
                   the big data infrastructure. Processing logic needs to understand the appropriate
                   metadata schema used in loading the data. If this match is missed then errors
                   could creep into processing the data.

                   Big data processing can have combinations of these limitations and complexities,
                 which will need to be accommodated in the processing of the data. The next section
                 discusses the steps in processing big data.


                 Governance model for building an application

                 The subject is very easy to state but extremely complex when discussed in layers, we will
                 look at the same. Fig. 9.5 is a description of the layers of governance related to the build
                 of applications.
                   As seen in Fig. 9.5, there are several layers of governance applied in big data applications.
                 The first layer is the data layer, which covers the data management of big data that will be
                 used in the entire exercise, this includes data acquisition, data discovery, daya analysis, data
                 taging. Metadata processing, master data integration, and data delivery to data lakes and
                 data hubs. The complexity of governance is these layers is very essential to understand, the
                 reason being that we will have data discovery, exploration, and analysis being done by many
                 business users across different teams and their outcomes, logs, and analysis need to be
                 sorted, processed, and managed efficiently. We will be delivering these processes and ser-
                 vices both as microservices and as robotic process automation exercises, these need to be
                 documented, captured, and stored for easy access and use, the functionality of this exercise
                 is also falling under governance.
   163   164   165   166   167   168   169   170   171   172   173