Page 101 - Building Big Data Applications
P. 101

96 Building Big Data Applications


                Establishing the existence of a new form of matter is a rare achievement, but the
             result has resonance in another field: cosmology, the scientific study of how the entire
             universe began and developed into the form we now witness. For many years,
             cosmologists studying the Big Bang theory were stymied. They had pieced together
             a robust description of how the universe evolved from a split second after the beginning,
             but they were unable to give any insight into what drove space to start expanding in the
             first place. What force could have exerted such a powerful outward push? For all its
             success, the Big Bang theory left out the bang. The LHC’s confirmation that at least one
             such field actually exists thus puts a generation of cosmological theorizing on a far firmer
             foundation.
                Lessons Learned: The significant set of lessons we have learned in discussing the
             CERN situation and its outcomes with Big Data Analytics implementation and the future
             goals include the following:
                Problem Statement: Define the problem clearly, including the symptoms, situations,
             issues, risks, and anticipated resolutions. The CERN team started this process since the
             inception of the LEP and throughout the lifecycle of all its associated devices; they also
             defined the gaps and areas of improvement to be accomplished which were all defined in
             the LHC process.
                Define solution: this segment should identify all possible solutions for each area of
             the problem. The solution segment can consist of multiple tools and heterogenous
             technology stacks integrated for a definitive, scalable, flexible, and secure outcome. The
             definition of the solution should include analytics, formulas, data quality, data cleansing,
             transformation, rules, exceptions, and workarounds. These steps will need to be
             executed for each area and include all the processes to be defined in clarity. CERN team
             has implemented this and added governance to ensure that the steps are completed in
             accordance and no gaps are left unanswered, and if gaps exist there are tags and tasks
             associated with the tags for potential completion.
                Step by step execution: is a very essential mechanism to learn how to become
             successful. If you read the discovery of the Higgs field, the experiment proves that we
             need to iterate multiple times for every step to analyze the foundational aspects, which
             will provide us more insights to drill through to greater depths. This step by step process
             is very much seen to bring success, whether we work on cancer research or in-depth
             particle physics research the concept to proof perspective demands steps and out-
             comes at each step, adjustments to be made recorded and the step reprocessed and
             outcomes recorded.
                In big data applications the step by step execution is very much possible with the data
             collected in HDFS at the raw operational level, which can be explored, discovered,
             experimented and constructed in multiple methods, cycles and analysis of the details
             performed for the data. All of these are possible within the HDFS layers which provides
             us the playground to prove the possibilities. The cost models are not necessarily cheap,
             CERN for example has spent over $1B on infrastructure worldwide over the years, but
   96   97   98   99   100   101   102   103   104   105   106