Page 106 - Building Big Data Applications
P. 106

102   Building Big Data Applications


             The complexity design for data applications

             A new methodology can be incorporated for understanding complexity and designing
             the architecture to handle the same. The methodology is a combination of steps
             migrated from traditional techniques of data processing along with new steps. Here is a
             view of the methodology.
                The steps for managing complexity are shown in Fig. 5.2. These steps can be phased
             and sequenced depending on the infrastructure and the type of data we need to work
             with. In the world of big data, the complexity of the data is associated with its formats
             and the speed of the production of data. Let us start with understanding the sequencing
             of these steps; the data is streaming data from either an airplane or a critical patient,
             which comes with several metadata elements embedded in the data as it originates from
             the source. We need to understand how to deal with this data, learn from the discoveries
             the aspects that it portrays for use in compute and insight delivery.
                The first step in this process to understand the myriad complexity with the data we
             receive. To understand the data there are two ways, one is to collect sample data sets and
             understand the data, the second is to stream the data and setup algorithms to under-
             stand the data. The second technique is what big data algorithms will help us facilitate.
             The big data algorithms that manage and process streaming data are designed and
             implemented to process, tag, and analyze large and constantly moving volumes of data.
             There are many different ways to accomplish this, each with its own advantages and
             disadvantages. One approach is the native stream processing also called as tuple-at-a-
             time processing. In this technique, every event is processed as it comes in, one after
             the other, resulting in the lowest-possible latency. Unfortunately, processing every
             incoming event is also computationally expensive, especially if we are doing this in-
             memory and the data is complex by nature. The other technique is called the micro-
             batch processing, and in this approach, we make the opposite tradeoff, dividing



                   Data Acquisi on
                   Data Discovery
                   Data Explora on
                   Data A ribu on & Defini on
                   Data Analysis
                   Data Tagging
                   Data Segmenta on & Classifica on
                   Data Storyboarding
                   Data Lake Design
                   Data Hub DEsign
                   Analy cs

                                      FIGURE 5.2 Data complexity methodology.
   101   102   103   104   105   106   107   108   109   110   111