Page 163 -
P. 163

HAN 11-ch04-125-186-9780123814791
                                                             2011/6/1

          126   Chapter 4 Data Warehousing and Online Analytical Processing  3:17  Page 126  #2



                         systems and data warehouses (Section 4.1.2), then explain the need for using data ware-
                         houses for data analysis, rather than performing the analysis directly on traditional
                         databases (Section 4.1.3). This is followed by a presentation of data warehouse architec-
                         ture (Section 4.1.4). Next, we study three data warehouse models—an enterprise model,
                         a data mart, and a virtual warehouse (Section 4.1.5). Section 4.1.6 describes back-end
                         utilities for data warehousing, such as extraction, transformation, and loading. Finally,
                         Section 4.1.7 presents the metadata repository, which stores data about data.



                   4.1.1 What Is a Data Warehouse?

                         Data warehousing provides architectures and tools for business executives to system-
                         atically organize, understand, and use their data to make strategic decisions. Data
                         warehouse systems are valuable tools in today’s competitive, fast-evolving world. In the
                         last several years, many firms have spent millions of dollars in building enterprise-wide
                         data warehouses. Many people feel that with competition mounting in every industry,
                         data warehousing is the latest must-have marketing weapon—a way to retain customers
                         by learning more about their needs.
                           “Then, what exactly is a data warehouse?” Data warehouses have been defined in many
                         ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data
                         warehouse refers to a data repository that is maintained separately from an organiza-
                         tion’s operational databases. Data warehouse systems allow for integration of a variety of
                         application systems. They support information processing by providing a solid platform
                         of consolidated historic data for analysis.
                           According to William H. Inmon, a leading architect in the construction of data
                         warehouse systems, “A data warehouse is a subject-oriented, integrated, time-variant,
                         and nonvolatile collection of data in support of management’s decision making pro-
                         cess” [Inm96]. This short but comprehensive definition presents the major features of
                         a data warehouse. The four keywords—subject-oriented, integrated, time-variant, and
                         nonvolatile—distinguish data warehouses from other data repository systems, such as
                         relational database systems, transaction processing systems, and file systems.
                           Let’s take a closer look at each of these key features.


                           Subject-oriented: A data warehouse is organized around major subjects such as cus-
                           tomer, supplier, product, and sales. Rather than concentrating on the day-to-day
                           operations and transaction processing of an organization, a data warehouse focuses
                           on the modeling and analysis of data for decision makers. Hence, data warehouses
                           typically provide a simple and concise view of particular subject issues by excluding
                           data that are not useful in the decision support process.
                           Integrated: A data warehouse is usually constructed by integrating multiple hetero-
                           geneous sources, such as relational databases, flat files, and online transaction
                           records. Data cleaning and data integration techniques are applied to ensure con-
                           sistency in naming conventions, encoding structures, attribute measures, and so on.
   158   159   160   161   162   163   164   165   166   167   168