Page 167 - Building Big Data Applications
P. 167

166   Building Big Data Applications


                In this processing stage of data movement and management, metadata is very
             essential to ensure auditability and traceability of data and process.
               Storage Stagedin this process the data transformed to final storage at rest is
                loaded to the data structures. Metadata can be useful in creating agile processes to
                load and store data in a scalable and flexible architecture.
                  Metadata used in this stage includes loading process, data structures, audit pro-
                   cess, and exception processing.
               Distribution Stagedin this stage, data is extracted or processed for use in down-
                stream systems. Metadata is very useful in determining the different extract pro-
                grams, the interfaces between the data warehouse or data mart and the
                downstream applications and auditing data usage and user activity.
                In a very efficiently designed system as described in Fig. 11.4 we can create an
             extremely scalable and powerful data processing architecture based on metadata and
             master data. The challenge in this situation is the processing complexity and how the
             architecture and design of the data management platform can be compartmentalized to
             isolate the complexities to each stage within its own layer of integration. Modern data
             architecture design will create the need for this approach to process and manage the
             lifecycle of data in any organization.
                We have discussed the use of metadata and master data in creating an extremely agile
             and scalable solution for processing data with applications in the modern data ware-
             house. The next section will focus on implementing governance and leveraging benefits
             from the same.


             Processing complexity of big data

             The most complicated step in processing big data lies with the not just the volume or
             velocity of the data but also its

               Variety of formatsddata can be presented for processing as excel spreadsheets,
                word documents, pdf files, OCR data, email, data from content management plat-
                forms, data from legacy applications, and data from web applications. Sometimes
                it may be variations of the same data over many time periods where the metadata
                changed significantly.
               Ambiguity of datadcan arise from simple issues like naming conventions to similar
                column names of different types of data to same column storing different types of
                data. A lack of metadata and taxonomies can create a significant delay in process-
                ing this data.
               Abstracted layers of hierarchydthe most complex area in big data processing are
                the hidden layers of hierarchy. Data contained in textual, semistructured, image
   162   163   164   165   166   167   168   169   170   171   172