Page 98 - Building Big Data Applications
P. 98

Chapter 4   Scientific research applications and usage  93


                 XRootD filesystem interface project

                 The CERN team evaluated the Apache stack and identified a few gaps between where
                 they were with current technology and the new stack to be augmented. The gaps were all
                 physics files were written using the ROOT project and this project was developed in cþþ
                 and formats will not be able to load into AVRO or Spark. The CERN team joined hands
                 with DIANA-HEP team to create the XRootD project. The project was designed to load
                 physics files into HDFS and Spark. Details of the project can be found at http://xrootd.
                 org and the GitHub page for the project is at https://github.com/cerndb/hadoop-xrootd.

















                                                   XRootD Project


                   XRootD: The XRootD project aims at giving high performance, scalable fault tolerant
                 access to data repositories of different kinds, and the access will be delivered as file
                 based. The project was conceived to be delivered on a scalable architecture,
                 a communication protocol, and a set of plug-ins and tools based on those. The freedom
                 to configure XRootD and to make it scale (for size and performance) allows the
                 deployment of data access clusters of virtually any size, which can include sophisticated
                 features, like authentication/authorization, integrations with other systems, and
                 distributed data distribution. XRootD software framework is a fully generic suite for fast,
                 low latency, and scalable data access, which can serve natively any kind of data,
                 organized as a hierarchical filesystem-like namespace, based on the concept of directory.


                 Service for web-based analysis (SWAN)

                 CERN has packaged and built a service layer for analysis based on the web browser. This
                 service called SWAN is a combination of the Jupyter notebook, Python, Cþþ, ROOT,
                 Java, Spark, and several other API interfaces. The package is available for download and
                 usage for any consumer who works with CERN. The SWAN service is available at https://
                 swan.web.cern.ch.
                   There are several other innovations to manage the large files, the streaming analytics,
                 the in-memory analytics, and kerberos security plug-ins.
   93   94   95   96   97   98   99   100   101   102   103