Page 126 - Building Big Data Applications
P. 126

Chapter 6   Visualization, storyboarding and applications  123


                   Data taggingdis the process of creating an identifying link on the data for
                   metadata integration.
                   Data classificationdis the process of creating subsets of value pairs for data
                   processing and integration. An example of this is extracting website URL in
                   clickstream data along with page-view information.
                   Data modelingdis the process of creating a model for data visualization or
                   analytics. The output from this step can be combined into an extraction exercise.

                   Once the data is prepared for analysis in the discovery stage, the users can extract
                 the result sets from any stage and use it for integration. These steps require a skill
                 combination of data analytics and statistical modeling, which is the role of a data
                 scientist. The question that confronts the users today is how to do the data discovery,
                 do you develop MapReduce code extensively or do you use software like Tableau or
                 Apache Presto. The answer to this question is simple, rather than develop extensive
                 lines of MapReduce code, which may not be reusable, you can adopt to using data
                 discovery and analysis tools that actually can produce the MapReduce code based on
                 the operations that you execute.
                   Depending on whichever method you choose to architect the solution your data
                 discovery framework is the key to developing big data analytics within your organization.
                 Once the data is ready for visualization, you can integrate the data with mash-ups and
                 other powerful visualization tools and provide the dashboards to the users.


                 Visualization

                 Big data visualization is not like traditional business intelligence where the data is
                 interactive and can be processed as drill downs and roll-ups in a hierarchy or can be
                 drilled into in a real-time fashion. This data is static in nature and will be minimally
                 interactive in a visualization situation. The underlying reason for this static nature is due
                 to the design of the big data platform like Hadoop or NoSQL, where the data is stored in
                 files and not in table structured, and processing changes will require massive file
                 operations, which are best, performed in a microbatch environment as opposed to a
                 real-time environment. This limitation is being addressed in the next generation of
                 Hadoop and other big data platforms.
                   Today the data that is available for visualization is largely integrated using mash-up
                 tools and software that support such functionality including Tableau and Spotfire. The
                 mash-up platform provides the capability for the user to integrate data from multiple
                 streams into one picture, by linking common data between the different datasets.
                   For example, if you are looking at integrating customer sentiment analytics with
                 campaign data, field sales data, and competitive research data, the mash-up that will be
                 created to view all of this information will be integrating the customer sentiment with
                 campaign data using product and geography information, the competitive research data
                 and the campaign data by using geography information, the sales data and the campaign
   121   122   123   124   125   126   127   128   129   130   131