Page 38 - Building Big Data Applications

P. 38

32 Building Big Data Applications

NameNode conﬁrmation about the availability of the blocks and the replicas of the
DataNode. Additionally,heartbeats also carry information about total storage capacity,
storage in use, and the number of data transfers currently in progress. These statistics are
by the NameNode for managing space allocation and load balancing.
During normal operations, if the NameNode does not receive a heartbeat from a
DataNode in 10 minutes, the NameNodeconsiders theDataNode to be out of service and
the block replicas hosted to be unavailable. The NameNode schedules creation of new
replicas of those blocks on other DataNodes.
The heartbeats carry round-trip communications and instructions from the
NameNode, these include commands to

Replicate blocks to other nodes
Remove local block replicas
Reregister the node
Shut down the node
Send an immediate block report
Frequent heartbeats and replies are extremely important for maintaining the overall
system integrity even on big clusters. Typically a NameNode can process thousands of
heartbeats per second without affecting other operations.

CheckPointNode and BackupNode

There are two roles that a NameNode can be designated to perform apart from servicing
client requests and managing DataNodes. These roles are speciﬁed during startup and
can be the CheckPointNode or the BackupNode.

CheckPointNode

The CheckpointNode serves as a journal capture architecture to create a recovery
mechanism for the NameNode. The checkpointnode combines the existing checkpoint
and journal to create a new checkpoint and an empty journal in speciﬁc intervals. It
returns the new checkpoint to the NameNode. The CheckpointNode will runs on a
different host from the NameNode since it has the same memory requirements as the
NameNode.
By creating a checkpoint the NameNode can truncate the tail of the current journal.
Since HDFS clusters run for prolonged periods of time without restarts, resulting in very
large journal growth, increasing the probability of loss or corruption. This mechanism
provides a protection mechanism.

33 34 35 36 37 38 39 40 41 42 43