Page 104 - Building Big Data Applications
P. 104

100   Building Big Data Applications


                and discoveries that need to be revalidated due to lack of clarity on the process
                complexity.
               The system can adapt itself according to its history or feedback. This is another vi-
                tal complexity that needs to be handled, and the system self-adaptability needs to
                be defined and documented as requirements and the specific outcomes for this
                issue needs to be articulated with fine grain details, which will help the process
                complexity be handled. These aspects are not handled with the greatest set of re-
                quirements processes and often lead to delays of systems being accepted during
                the user acceptance testing stages, whether we do waterfall or agile development.
               The relations between the system and its environment are nontrivial or nonlinear.
                This aspect is very important to understand as the internet of things and big data
                applications today provide data from anywhere all the time. There is a need to un-
                derstand that changes can occur in the data in the nonlinear relationship, which
                needs to be ingested and processed, linked to the prior state and analytics need to
                be computed based on the change of data, this complexity is to be defined and
                processed. Whether the change is trivial or nontrivial needs to be defined by the
                aspect of data we are working with, for example if we are monitoring the stock
                markets, and we get streaming feeds to fake and real information, who can validate
                the data and what aspects need to be put on an alert? If we do go ahead with the
                data and it turns out to be fake information, who can validate and clarify? This
                simple process is complexity for data in the world we live today in and the internet
                makes it all the more complex as it provides the news 24   7   365 to the world.
               The system is highly sensitive to initial conditions. In scientific research this
                complexity is very much present every minute. The reason why it is complex is
                because the initiator of any experiment has a specific set of conditions that they
                want to study and collect behaviors of the system. This data set is very relevant to
                the person studying the data, but the initial sensitivity needs to be documented to
                provide relevance to other users of the data or the system. The initial conditions
                are essential in scientific research and numerous applications of this data with big
                data platforms will allow the application to be built following the steps of each
                stage and this data can be captured in specific files and reused to understand the
                outcomes. We did not have this capability in the traditional data world due to the
                formats and compliance of data structures.

                The data and the system complexity are further understood as we look at the layers of
             data that we will need to assimilate for building applications. The data layers here will
             need to be defined, managed, transformed, logged, aggregated, and be available. Let us
             see this series, and this is described in the figure below (Fig. 5.1).
                As seen in the picture above, we have the innate capability to take a data at its raw
             granularity layer as a small particle and by the time it becomes a layer in the analytics,
             we apply transformations and aggregations, decide to change formats, leave attributes
   99   100   101   102   103   104   105   106   107   108   109