Page 37 - Building Big Data Applications

P. 37

Chapter 2 Infrastructure and technology 31

Replication and recovery

In the original design of HDFS there was a single NameNode for each cluster, which
became the single point of failure. This has been addressed in the recent releases of
HDFS where NameNode replication is now a standard feature like DataNode replication.

NameNode and DataNodedcommunication and
management

The communication and management between a NameNode and DataNodes are
managed through a series of handshakes and system ID’s. Upon initial creation and
formatting, a namespace ID is assigned to the ﬁlesystem on the NameNode. This ID is
persistently stored on all the nodes across the cluster. DataNodes similarly are assigned a
unique storage_idon the initial creation and registration with a NameNode. This stor-
age_id never changes and will be persistent event if the DataNode is started on a
different IP Address or Port.
During startup process, the NameNode completes its namespace refresh and is ready
to establish the communication with the DataNode. To ensure that each DataNode that
connects to the NameNode is the correct DataNode, there is a series of veriﬁcation steps:
The DataNode identiﬁes itself to the NamNode with a handshake and veriﬁes its
namespace ID and software version.
If either does not match with the NameNode, the DataNode automatically shuts
down.
The signature veriﬁcation process prevents incorrect nodes from joining the cluster
and automatically preserves the integrity of the ﬁlesystem.
The signature veriﬁcation process also is an assurance check for consistency of
software versions between the NameNode and DataNode since incompatible
version can cause data corruption or loss.
Post the handshake and validation on the NameNode, a DataNode sends a block
report. A block report contains the block id, the length for each block replica and
the generation stamp.
The ﬁrst block report is sent immediately upon the DataNode registration.
Subsequently hourly updates of the block report is sent to the NameNode, which
provides the view of where block replicas are located on the cluster.
When a new DataNode is added and initialized, since it does not have a name-
space ID is permitted to join the cluster and receive the cluster’s namespace ID.

Heartbeats

The connectivity between the NameNode and DataNode are managed by the persistent
heartbeats that are sent by the DataNode every 3 seconds. The heartbeat provides the

32 33 34 35 36 37 38 39 40 41 42