Page 45 - Building Big Data Applications
P. 45

Chapter 2   Infrastructure and technology  39





























                                      FIGURE 2.11 Conceptual SQL/MapReduce architecture.

                   Files once processed cannot be processed from a mid-point. If a new version of the
                   data is sent by files, the entire file has to be processed
                   MapReduce on large clusters can be difficult to manage
                   The entire platform by design is oriented to handle extremely large files and hence
                   is not suited for transaction processing
                   When the files are broken for processing, the consistency of the files completing
                   processing on all nodes in a cluster is a soft state model of eventual consistency

                 Zookeeper

                 Developing large-scale applications on Hadoop or any distributed platform mandates
                 that a resource and application coordinator be available to coordinate the tasks between
                 nodes. In a controlled environment like the RDBMS or SOA programming, the tasks are
                 generated in a controlled manner and the coordination simply needs to ensure suc-
                 cessful network management without data loss and the health check on the nodes in a
                 distributed system. In the case of Hadoop, the minimum volumes of data starts with
                 multi-terabytes and the data is distributed across files on multiple nodes. Keeping users
                 queries and associated tasks mandates a coordinator that is as flexible and scalable as
                 the platform itself.
                   ZooKeeper is an open source, in-memory, distributed NoSQL database that is used
                 for coordination services for managing distributed applications. It consists of a simple
                 set of functions that can be used to build services for synchronization, configuration
                 maintenance, groups, and naming. Zookeeper has a filesystem structure that mirrors
   40   41   42   43   44   45   46   47   48   49   50