Page 223 - Building Big Data Applications
P. 223
Index 223
Enterprise data warehouse (EDW), 6 Google MapReduce cluster, 24f
Eroom’s law, 106 architecture, 25
European Council for Nuclear Research chunkservers, 24
(CERN) corruption, 24e25
Hadoop configuration, 92 input data files, 23
Higgs Boson discovery, 85e86 metadata, 24
Big Bang theory, 96 single point of failure (SPOF), 24
drag force, 94 Graph databases, 14, 70
governance, 97
Large Hadron Collider (LHC), 95 H
mathematical studies, 94 Hadoop distributed filesystem (HDFS)
open source adoption, 97 architecture, 28, 29f
quantum physics, 94 BackupNode, 33
solution segment, 96 block allocation and storage, 30
high-energy accelerators, 86 Checkpoint, 29e30
Large Hadron Collider (LHC) CheckpointNode, 32
ALEPH detector, 89 Chukwa, 54
data calculations, 90 client, 30
data generation, 91e92, 91f data processing problem, 27e28
data processing architecture, 92 DataNode, 28e29
DELPHI detector, 89 Filesystem snapshots, 33e36
detectors, 89 fundamental design principles, 27e28
experiments, 88, 90 Image, 29
L3 detector, 89 Journal, 29
location and components, 88 NameNode, 28
OPAL detector, 89 principle goals, 28
Worldwide LHC Computing Grid replication and recovery, 31
(WLCG), 90 startup, 30
mass and energy measurement, 86 Hadoop technology, 9
PySpark implementation, 92 HBASE
quarks and leptons, 87e88 architecture implementation, 47e49
service for web-based analysis (SWAN), 93 components, 48f
Standard Model Higgs boson, 86 data model, 46, 47f
XRootD filesystem interface project, 93 HBaseMaster, 47
Execution Engine, Hive architecture, 50 HRegionServer, 47
META table, 48e49
F ROOT table, 48e49
Filesystem snapshots, 33e36 HBaseMaster, 47
Flight APIs, 154 HCatalog, 54e58
Flume, 54 High frequency trades (HFTs), 137
High-Performance Computing (HPC), 110
G Hinted handoff, 65
Ganglia, 185 Hive
General Data Protection Regulation architecture, 50e51, 50f
(GDPR), 211 data types, 53