Page 223 - Building Big Data Applications

P. 223

Index 223

Enterprise data warehouse (EDW), 6 Google MapReduce cluster, 24f
Eroom’s law, 106 architecture, 25
European Council for Nuclear Research chunkservers, 24
(CERN) corruption, 24e25
Hadoop conﬁguration, 92 input data ﬁles, 23
Higgs Boson discovery, 85e86 metadata, 24
Big Bang theory, 96 single point of failure (SPOF), 24
drag force, 94 Graph databases, 14, 70
governance, 97
Large Hadron Collider (LHC), 95 H
mathematical studies, 94 Hadoop distributed ﬁlesystem (HDFS)
open source adoption, 97 architecture, 28, 29f
quantum physics, 94 BackupNode, 33
solution segment, 96 block allocation and storage, 30
high-energy accelerators, 86 Checkpoint, 29e30
Large Hadron Collider (LHC) CheckpointNode, 32
ALEPH detector, 89 Chukwa, 54
data calculations, 90 client, 30
data generation, 91e92, 91f data processing problem, 27e28
data processing architecture, 92 DataNode, 28e29
DELPHI detector, 89 Filesystem snapshots, 33e36
detectors, 89 fundamental design principles, 27e28
experiments, 88, 90 Image, 29
L3 detector, 89 Journal, 29
location and components, 88 NameNode, 28
OPAL detector, 89 principle goals, 28
Worldwide LHC Computing Grid replication and recovery, 31
(WLCG), 90 startup, 30
mass and energy measurement, 86 Hadoop technology, 9
PySpark implementation, 92 HBASE
quarks and leptons, 87e88 architecture implementation, 47e49
service for web-based analysis (SWAN), 93 components, 48f
Standard Model Higgs boson, 86 data model, 46, 47f
XRootD ﬁlesystem interface project, 93 HBaseMaster, 47
Execution Engine, Hive architecture, 50 HRegionServer, 47
META table, 48e49
F ROOT table, 48e49
Filesystem snapshots, 33e36 HBaseMaster, 47
Flight APIs, 154 HCatalog, 54e58
Flume, 54 High frequency trades (HFTs), 137
High-Performance Computing (HPC), 110
G Hinted handoff, 65
Ganglia, 185 Hive
General Data Protection Regulation architecture, 50e51, 50f
(GDPR), 211 data types, 53

218 219 220 221 222 223 224 225 226 227 228