Page 29 - Building Big Data Applications
P. 29

Chapter 2   Infrastructure and technology  23


                   Reduce (out_key, intermediate_value list) - > out_value list
                     The Reduce function written by the user will accept an intermediate key I, and
                      the set of values for the key.
                     It will merge together these values to form a possibly smaller set of values.
                     Reducer outputs are just zero or one output value per invocation.
                     The intermediate values are supplied to the reduce function via an iterator. The
                      iterator function allows us to handle large lists of values that cannot fit in mem-
                      ory or a single pass.

                 MapReduce Google architecture

                 In the original architecture that Google proposed and implemented, MapReduce con-
                 sisted of the architecture and components as described in Fig. 2.3. The key pieces of the
                 architecture include the following:

                   A GFS cluster
                     A single master
                     Multiple chunkservers (workers or slaves) per master
                     Accessed by multiple clients
                     Running on commodity Linux machines
                   A file
                     Represented as fixed-sized chunks
                     Labeled with 64-bit unique global IDs
                     Stored at chunkservers and 3-way mirrored across chunkservers
                   In the GFS cluster, input data files are divided into chunks (64 MB is the standard
                 chunk size), each assigned its unique 64-bit handle, and stored on local chunkserver
                 systems as files. To ensure fault tolerance and scalability, each chunk is replicated at
                 least once on another server, and the default design is to create three copies of a chunk
                 (Fig. 2.4).




















                                           FIGURE 2.3 Clienteserver architecture.
   24   25   26   27   28   29   30   31   32   33   34