Page 23 - Building Big Data Applications
P. 23
2
Infrastructure and technology
This chapter will introduce all the infrastructure components and technology vendors
who are providing services. We will discuss in detail the components and their inte-
gration, the technology limitations if any to be known, specifics on the technology for
users to identify and align with.
The first rule of any technology used in a business is that automation applied to an
efficient operation will magnify the efficiency. The second is that automation
applied to an inefficient operation will magnify the inefficiency.
Source: Brainy QuoteeBill Gates
Introduction
In the previous chapter we discussed the complexities associated with big data. There is
a three-dimensional problem with processing this type of data; the dimensions being the
volume of the data produced, the variety of formats, and the velocity of data generation.
To handle any of these problems in traditional data processing architecture is not a
feasible option. The problem by itself did not originate in the last decade and has been
something that was being solved by various architects, researchers, and organizations
over the years. A simplified approach to large data processing was to create distributed
data processing architectures and manage the coordination by programming language
techniques. This approach while solving the volume requirement did not have the
capability to handle the other two dimensions. With the advent of Internet and search
engines, the need to handle the complex and diverse data became a necessity and not a
one-off requirement. It is during this time in the early 1990s a slew of distributed data
processing papers and associated algorithms and techniques were published by Google,
Stanford University, Dr.Stonebraker, Eric Brewer, Doug Cutting (Nutch Search Engine),
and Yahoo among others.
Today the various architectures and papers that were contributed by these and other
developers across the world have culminated into several open source projects under the
Apache Software Foundation and the NoSQL movement. All of these technologies have
been identified as big data processing platforms including Hadoop, Hive, HBase,
Cassandra, and MapReduce. NoSQL platforms include MongoDB, Neo4J, Riak, Amazon
DynamoDB, MemcachedDB, BerkleyDB, Voldemort, and many more. Though many of
these platforms were originally developed and deployed for solving the data processing
needs of web applications and search engines, they have been evolved to support other
Building Big Data Applications. https://doi.org/10.1016/B978-0-12-815746-6.00002-8 17
Copyright © 2020 Elsevier Inc. All rights reserved.