Page 101 - Building Big Data Applications

P. 101

96 Building Big Data Applications

Establishing the existence of a new form of matter is a rare achievement, but the
result has resonance in another ﬁeld: cosmology, the scientiﬁc study of how the entire
universe began and developed into the form we now witness. For many years,
cosmologists studying the Big Bang theory were stymied. They had pieced together
a robust description of how the universe evolved from a split second after the beginning,
but they were unable to give any insight into what drove space to start expanding in the
ﬁrst place. What force could have exerted such a powerful outward push? For all its
success, the Big Bang theory left out the bang. The LHC’s conﬁrmation that at least one
such ﬁeld actually exists thus puts a generation of cosmological theorizing on a far ﬁrmer
foundation.
Lessons Learned: The signiﬁcant set of lessons we have learned in discussing the
CERN situation and its outcomes with Big Data Analytics implementation and the future
goals include the following:
Problem Statement: Deﬁne the problem clearly, including the symptoms, situations,
issues, risks, and anticipated resolutions. The CERN team started this process since the
inception of the LEP and throughout the lifecycle of all its associated devices; they also
deﬁned the gaps and areas of improvement to be accomplished which were all deﬁned in
the LHC process.
Deﬁne solution: this segment should identify all possible solutions for each area of
the problem. The solution segment can consist of multiple tools and heterogenous
technology stacks integrated for a deﬁnitive, scalable, ﬂexible, and secure outcome. The
deﬁnition of the solution should include analytics, formulas, data quality, data cleansing,
transformation, rules, exceptions, and workarounds. These steps will need to be
executed for each area and include all the processes to be deﬁned in clarity. CERN team
has implemented this and added governance to ensure that the steps are completed in
accordance and no gaps are left unanswered, and if gaps exist there are tags and tasks
associated with the tags for potential completion.
Step by step execution: is a very essential mechanism to learn how to become
successful. If you read the discovery of the Higgs ﬁeld, the experiment proves that we
need to iterate multiple times for every step to analyze the foundational aspects, which
will provide us more insights to drill through to greater depths. This step by step process
is very much seen to bring success, whether we work on cancer research or in-depth
particle physics research the concept to proof perspective demands steps and out-
comes at each step, adjustments to be made recorded and the step reprocessed and
outcomes recorded.
In big data applications the step by step execution is very much possible with the data
collected in HDFS at the raw operational level, which can be explored, discovered,
experimented and constructed in multiple methods, cycles and analysis of the details
performed for the data. All of these are possible within the HDFS layers which provides
us the playground to prove the possibilities. The cost models are not necessarily cheap,
CERN for example has spent over $1B on infrastructure worldwide over the years, but

96 97 98 99 100 101 102 103 104 105 106