Page 104 - Building Big Data Applications

P. 104

100 Building Big Data Applications

and discoveries that need to be revalidated due to lack of clarity on the process
complexity.
The system can adapt itself according to its history or feedback. This is another vi-
tal complexity that needs to be handled, and the system self-adaptability needs to
be deﬁned and documented as requirements and the speciﬁc outcomes for this
issue needs to be articulated with ﬁne grain details, which will help the process
complexity be handled. These aspects are not handled with the greatest set of re-
quirements processes and often lead to delays of systems being accepted during
the user acceptance testing stages, whether we do waterfall or agile development.
The relations between the system and its environment are nontrivial or nonlinear.
This aspect is very important to understand as the internet of things and big data
applications today provide data from anywhere all the time. There is a need to un-
derstand that changes can occur in the data in the nonlinear relationship, which
needs to be ingested and processed, linked to the prior state and analytics need to
be computed based on the change of data, this complexity is to be deﬁned and
processed. Whether the change is trivial or nontrivial needs to be deﬁned by the
aspect of data we are working with, for example if we are monitoring the stock
markets, and we get streaming feeds to fake and real information, who can validate
the data and what aspects need to be put on an alert? If we do go ahead with the
data and it turns out to be fake information, who can validate and clarify? This
simple process is complexity for data in the world we live today in and the internet
makes it all the more complex as it provides the news 24 7 365 to the world.
The system is highly sensitive to initial conditions. In scientiﬁc research this
complexity is very much present every minute. The reason why it is complex is
because the initiator of any experiment has a speciﬁc set of conditions that they
want to study and collect behaviors of the system. This data set is very relevant to
the person studying the data, but the initial sensitivity needs to be documented to
provide relevance to other users of the data or the system. The initial conditions
are essential in scientiﬁc research and numerous applications of this data with big
data platforms will allow the application to be built following the steps of each
stage and this data can be captured in speciﬁc ﬁles and reused to understand the
outcomes. We did not have this capability in the traditional data world due to the
formats and compliance of data structures.

The data and the system complexity are further understood as we look at the layers of
data that we will need to assimilate for building applications. The data layers here will
need to be deﬁned, managed, transformed, logged, aggregated, and be available. Let us
see this series, and this is described in the ﬁgure below (Fig. 5.1).
As seen in the picture above, we have the innate capability to take a data at its raw
granularity layer as a small particle and by the time it becomes a layer in the analytics,
we apply transformations and aggregations, decide to change formats, leave attributes

99 100 101 102 103 104 105 106 107 108 109