Page 97 - Building Big Data Applications
P. 97

92 Building Big Data Applications


             compute on database platforms and we need to execute streaming analytics in memory
             as data streams. The challenge here is that we will collect several terabytes of data from
             source generated files, but need to provide 100 e200 GB new extracts for analytics, while
             we will still have access to operational data for running analytics and exploration.
                To process data the new platforms to add included Apache Hadoop, Apache Kafka,
             Apache Spark, Apache Flume, Apache Impala, Oracle, and NoSQL database. This data
             processing architecture will be integrated with the existing ecosystem of Oracle databases,
             SAS, and Analytics systems. The Apache stack selected is shown in the picture below.






















                Hadoop configuration implemented at CERN includes the following:
               Baer Metal Hadoop/YARN Clusters
                  five Clusters
                  110 þ nodes
                  14 þ PBs Storage
                  20 þ TB Memory
                  3100 þ Cores
                  HDDs and SDDs
                Access to data is provided with Active Directory and native security rules are enclosed
             for each layer of the access from the Grid to Hadoop. The rules provide encryption,
             decryption, hierarchies, and granularity of access. The authorization policy is imple-
             mented in the rules and the authentication is implemented as Active Directory.
                The end user analysts and physicists at CERN use Jupyter notebooks with PySpark
             implementation to work on all the data. The Jupyter notebooks use Impala, Pig, and
             Python and several innovations have been added by the CERN team to use the Apache
             stack for their specific requirements. We will discuss these innovations in the next
             segment.
                Innovations:
   92   93   94   95   96   97   98   99   100   101   102