Page 257 -
P. 257

256 Part Two  Information Technology Infrastructure


                                   volumes of data as well as for a staging area for unstructured and semi-struc-
                                   tured data before they are loaded into a data warehouse. Facebook stores much of
                                   its data on its massive Hadoop cluster, which holds an estimated 100 petabytes,
                                   about 10,000 times more information than the Library of Congress. Yahoo uses
                                   Hadoop to track user behavior so it can modify its home page to fit their  interests.
                                   Life sciences research firm NextBio uses Hadoop and HBase to process data for
                                     pharmaceutical companies conducting genomic research. Top database vendors
                                   such as IBM, Hewlett-Packard, Oracle, and Microsoft have their own Hadoop
                                   software distributions. Other vendors offer tools for moving data into and out of
                                   Hadoop or for analyzing data within Hadoop.

                                   In-Memory Computing
                                   Another way of facilitating big data analysis is to use in-memory  computing,
                                   which relies primarily on a computer’s main memory (RAM) for data storage.
                                   (Conventional DBMS use disk storage systems.) Users access data stored in
                                   system primary memory, thereby eliminating bottlenecks from retrieving and
                                   reading data in a traditional, disk-based database and dramatically shortening
                                   query response times. In-memory processing makes it  possible for very large
                                   sets of data, amounting to the size of a data mart or small data warehouse, to
                                   reside entirely in memory. Complex business calculations that used to take
                                   hours or days are able to be completed within seconds, and this can even be
                                   accomplished on handheld devices.
                                     The previous chapter describes some of the advances in contemporary
                                     computer hardware technology that make in-memory processing possible, such
                                   as powerful high-speed processors, multicore processing, and falling computer
                                   memory prices. These technologies help companies optimize the use of memory
                                   and accelerate processing performance while lowering costs.
                                     Leading commercial products for in-memory computing include SAP’s High
                                   Performance Analytics Appliance (HANA) and Oracle Exalytics. Each provides
                                   a set of integrated software components, including in-memory database software
                                   and specialized analytics software, that run on hardware optimized for  in-memory
                                   computing work.
                                     Centrica, a gas and electric utility, uses HANA to quickly capture and  analyze
                                   the vast amounts of data generated by smart meters. The company is able to
                                   analyze usage every 15 minutes, giving it a much clearer picture of usage by
                                   neighborhood, home size, type of business served, or building type. HANA also
                                   helps Centrica show its customers their energy usage patterns in real-time using
                                   online and mobile tools.
                                   Analytic Platforms
                                   Commercial database vendors have developed specialized high-speed  analytic
                                   platforms using both relational and non-relational technology that are
                                     optimized for analyzing large datasets. These analytic platforms, such as IBM
                                   Netezza and Oracle Exadata, feature preconfigured hardware-software systems
                                   that are specifically designed for query processing and analytics. For example,
                                   IBM Netezza features tightly integrated database, server, and storage compo-
                                   nents that handle complex analytic queries 10 to 100 times faster than tradi-
                                   tional systems. Analytic platforms also include in-memory systems and NoSQL
                                   non-relational database management systems.
                                     Figure 6.12 illustrates a contemporary business intelligence infrastructure
                                   using the technologies we have just described. Current and historical data are
                                   extracted from multiple operational systems along with Web data, machine-
                                   generated data, unstructured audio/visual data, and data from external sources







   MIS_13_Ch_06 Global.indd   256                                                                             1/17/2013   2:27:43 PM
   252   253   254   255   256   257   258   259   260   261   262