Page 23 - Building Big Data Applications
P. 23

2




                 Infrastructure and technology





                 This chapter will introduce all the infrastructure components and technology vendors
                 who are providing services. We will discuss in detail the components and their inte-
                 gration, the technology limitations if any to be known, specifics on the technology for
                 users to identify and align with.

                   The first rule of any technology used in a business is that automation applied to an
                        efficient operation will magnify the efficiency. The second is that automation
                                     applied to an inefficient operation will magnify the inefficiency.
                                                                      Source: Brainy QuoteeBill Gates




                 Introduction

                 In the previous chapter we discussed the complexities associated with big data. There is
                 a three-dimensional problem with processing this type of data; the dimensions being the
                 volume of the data produced, the variety of formats, and the velocity of data generation.
                 To handle any of these problems in traditional data processing architecture is not a
                 feasible option. The problem by itself did not originate in the last decade and has been
                 something that was being solved by various architects, researchers, and organizations
                 over the years. A simplified approach to large data processing was to create distributed
                 data processing architectures and manage the coordination by programming language
                 techniques. This approach while solving the volume requirement did not have the
                 capability to handle the other two dimensions. With the advent of Internet and search
                 engines, the need to handle the complex and diverse data became a necessity and not a
                 one-off requirement. It is during this time in the early 1990s a slew of distributed data
                 processing papers and associated algorithms and techniques were published by Google,
                 Stanford University, Dr.Stonebraker, Eric Brewer, Doug Cutting (Nutch Search Engine),
                 and Yahoo among others.
                   Today the various architectures and papers that were contributed by these and other
                 developers across the world have culminated into several open source projects under the
                 Apache Software Foundation and the NoSQL movement. All of these technologies have
                 been identified as big data processing platforms including Hadoop, Hive, HBase,
                 Cassandra, and MapReduce. NoSQL platforms include MongoDB, Neo4J, Riak, Amazon
                 DynamoDB, MemcachedDB, BerkleyDB, Voldemort, and many more. Though many of
                 these platforms were originally developed and deployed for solving the data processing
                 needs of web applications and search engines, they have been evolved to support other

                 Building Big Data Applications. https://doi.org/10.1016/B978-0-12-815746-6.00002-8  17
                 Copyright © 2020 Elsevier Inc. All rights reserved.
   18   19   20   21   22   23   24   25   26   27   28