Page 257 -
P. 257
256 Part Two Information Technology Infrastructure
volumes of data as well as for a staging area for unstructured and semi-struc-
tured data before they are loaded into a data warehouse. Facebook stores much of
its data on its massive Hadoop cluster, which holds an estimated 100 petabytes,
about 10,000 times more information than the Library of Congress. Yahoo uses
Hadoop to track user behavior so it can modify its home page to fit their interests.
Life sciences research firm NextBio uses Hadoop and HBase to process data for
pharmaceutical companies conducting genomic research. Top database vendors
such as IBM, Hewlett-Packard, Oracle, and Microsoft have their own Hadoop
software distributions. Other vendors offer tools for moving data into and out of
Hadoop or for analyzing data within Hadoop.
In-Memory Computing
Another way of facilitating big data analysis is to use in-memory computing,
which relies primarily on a computer’s main memory (RAM) for data storage.
(Conventional DBMS use disk storage systems.) Users access data stored in
system primary memory, thereby eliminating bottlenecks from retrieving and
reading data in a traditional, disk-based database and dramatically shortening
query response times. In-memory processing makes it possible for very large
sets of data, amounting to the size of a data mart or small data warehouse, to
reside entirely in memory. Complex business calculations that used to take
hours or days are able to be completed within seconds, and this can even be
accomplished on handheld devices.
The previous chapter describes some of the advances in contemporary
computer hardware technology that make in-memory processing possible, such
as powerful high-speed processors, multicore processing, and falling computer
memory prices. These technologies help companies optimize the use of memory
and accelerate processing performance while lowering costs.
Leading commercial products for in-memory computing include SAP’s High
Performance Analytics Appliance (HANA) and Oracle Exalytics. Each provides
a set of integrated software components, including in-memory database software
and specialized analytics software, that run on hardware optimized for in-memory
computing work.
Centrica, a gas and electric utility, uses HANA to quickly capture and analyze
the vast amounts of data generated by smart meters. The company is able to
analyze usage every 15 minutes, giving it a much clearer picture of usage by
neighborhood, home size, type of business served, or building type. HANA also
helps Centrica show its customers their energy usage patterns in real-time using
online and mobile tools.
Analytic Platforms
Commercial database vendors have developed specialized high-speed analytic
platforms using both relational and non-relational technology that are
optimized for analyzing large datasets. These analytic platforms, such as IBM
Netezza and Oracle Exadata, feature preconfigured hardware-software systems
that are specifically designed for query processing and analytics. For example,
IBM Netezza features tightly integrated database, server, and storage compo-
nents that handle complex analytic queries 10 to 100 times faster than tradi-
tional systems. Analytic platforms also include in-memory systems and NoSQL
non-relational database management systems.
Figure 6.12 illustrates a contemporary business intelligence infrastructure
using the technologies we have just described. Current and historical data are
extracted from multiple operational systems along with Web data, machine-
generated data, unstructured audio/visual data, and data from external sources
MIS_13_Ch_06 Global.indd 256 1/17/2013 2:27:43 PM