Page 174 - Building Big Data Applications
P. 174

Chapter 9   Governance 173


                 is returned as personalized offers to the user. Often sponsors of specific products and
                 services provide such offers with incentives which are presented to the user by the
                 recommender algorithm output.
                   How does machine learning use metadata and master data? In the search example we
                 discussed, the metadata is derived for the search elements and tagged with additional
                 data as available. This data is compared and processed with the data from the knowledge
                 repository, which includes semantic libraries, and master data catalogs when the ma-
                 chine learning algorithm is executed. The combination of metadata and master data
                 along with use of semantic libraries provides a better quality of data to the machine
                 learning algorithm, which in turn produces better quality of output for use by hypothesis
                 and prediction workflows.
                   Processing data that is very numeric like sensor data, or financial data or credit card
                 data will be based on patterns of numbers that execute as data inputs. These patterns are
                 processed through several mathematical models and their outputs are stored in the
                 knowledge repository that then shared the stored results back into the processing loop in
                 the machine learning implementation.
                   Processing data such as images and videos uses conversion techniques to create
                 mathematical datasets for all the nontextual elements. These mathematical datasets are
                 processed through several combinations of data mining and machine learning algo-
                 rithms including statistical analysis, linear regression, and polynomial curve fitting
                 techniques, to create outputs. These outputs are processed further to create a noise free
                 set of outputs, which can be used for recreating the digital models of images or video
                 data (image only and not audio). Audio is processed as separate feeds and associated
                 with video processing datasets as needed.
                   Machine-learning techniques reduce the complexity of processing big data. The most
                 common and popular algorithms for machine learning with web-sale data processing are
                 available in the open-source foundation as Apache Mahout project. Mahout is designed
                 to be deployed on Hadoop with minimal configuration efforts and can scale very
                 effectively. While not all machine learning algorithms mandate the need for an enter-
                 prise data scientist, this is definitely the most complex area in the processing of large
                 datasets and having a team of data scientists will definitely be useful for any enterprise.
                   As we see from the discussions in this chapter, processing big data applications is
                 indeed a complex and challenging process. Since the room for error in this type of
                 processing is very minimal if allowed, the quality of the data used for processing needs to
                 be very pristine. This can be accomplished by implementing a data-driven architecture
                 that uses all the enterprise data assets available to create a powerful foundation for
                 analysis and integration of data across the Big Data and the DBMS. This foundational
                 architecture is what defines the next generation of data warehouse, where all types of
                 data are stored and processed to empower the enterprise toward making and executing
                 profitable decisions.
   169   170   171   172   173   174   175   176   177   178   179