Page 54 - Building Big Data Applications
P. 54

48 Building Big Data Applications




                                                   API


                                                       RegionServer

                            Master                                     Write Ahead
                                            HFile        MemStore
                                                                          Log



                                     Zookeeper                HDFS
                            FIGURE 2.14 HBASE components. Image sourcedGeorge Lars, @HUG Talk.


               The HBase client is a program API that can be executed from any language like
                Java or Cþþ to access HBASE
               ZookeeperdHBASE uses Zookeeper to coordinate all the activities between master
                and region servers

                How does HBASE internally manage all the communication between Zookeeper,
             master servers, and region servers? HBASE maintains two special catalog tables named
             ROOT and META. It maintains the current list, state, and location of all regions afloat on
             the cluster in these two catalogs. ROOT table contains the list of META table regions, and
             META table contains the list of all userspace regions. Entries in ROOT and META tables
             are keyed by region names, where a region name is made of the table name the region
             belongs to, the region’s start row, its time of creation, and a hash key value. Rowkeys are
             sorted by default and finding the region that hosts a particular row is a matter of a lookup
             to find the first entry where the key is greater than or equal to that of the requested
             rowkey. AS regions are split or deleted or disabled, the ROOT and META tables are
             constantly refreshed and thus the changes are immediately reflected to user requests.
                Clients connect to the ZooKeeper and get the access information to the ROOT. The
             ROOT provides information about the META, which points to the region whose scope
             covers that of the requested row. The client then gets all the data about the region, user
             space, the column family, and the location details by doing a lookup on the META table.
             Post the initial interaction with the master, the client directly starts working with the
             hosting region server.
                HBASE Clients cache all the information they gather traversing ROOT and META, by
             caching locations as well as the userspace, the region start and stop rows. The cached
             data provides all the details about the regions and the data available there, avoiding
             round trips to read the META table. In a normal mode of operation, clients continue to
             use the cached entries as they perform tasks, until there is a failure or abort. When a
             failure happens, it is normally due to the movement of the region itself causing the cache
   49   50   51   52   53   54   55   56   57   58   59