Page 47 - Building Big Data Applications
P. 47
Chapter 2 Infrastructure and technology 41
Leader electiondThere can be only one
Group membershipdDynamically determine members of a group
Queue Producer/Consumer paradigm
In Hadoop ecosystem, Zookeeper is implemented as a service to coordinate tasks.
Fig. 2.12 shows the implementation model of Zookeeper.
ZooKeeper as a service can be run in two modes:
Standalone modedthere is a single ZooKeeper server and this configuration is use-
ful for development or testing but provides no guarantees of highavailability or
resilience.
Replicated modedthis is the mode of deployment in production, on a cluster of
machines called an ensemble. ZooKeeper achieves high availability through repli-
cation, and can provide a service as long as a majority of the machines in the
ensemble are up and running. For example as seen in Fig. 2.12, in a five-node
ensemble, any two machines can fail and the service will still work because a ma-
jority of three remains (a quorum), whereas in a six node ensemble, a failure of
three means loss of majority, and shutdown of service. It is usual to have an odd
number of machines in an ensemble to avoid such situations.
Zookeeper has one task or goal, to endure that all the zNode changes across the
system are updated to the leader and followers. When a failure occurs in a minority of
machines, the replicas need to bring up the machines to a catch-up from the lag. To
implement the management of the ensemble, ZooKeeper uses a protocol called Zab that
runs in two steps and can be repetitive.
1. Leader electiondThe machines in an ensemble go through a process of electing a
distinguished member, called the leader. Clients communicate with one server in a
session and work on a read or write operation. As seen here, writes will be only
accomplished through the leader, which is then broadcast to the followers as an
FIGURE 2.12 Zookeeper ensemble.