Page 164 -
P. 164

3:17
                                                             2011/6/1
                                                                           Page 127
                          HAN 11-ch04-125-186-9780123814791
                                                                                    #3
                                                                 4.1 Data Warehouse: Basic Concepts  127


                                 Time-variant: Data are stored to provide information from an historic perspective
                                 (e.g., the past 5–10 years). Every key structure in the data warehouse contains, either
                                 implicitly or explicitly, a time element.
                                 Nonvolatile: A data warehouse is always a physically separate store of data trans-
                                 formed from the application data found in the operational environment. Due to
                                 this separation, a data warehouse does not require transaction processing, recovery,
                                 and concurrency control mechanisms. It usually requires only two operations in data
                                 accessing: initial loading of data and access of data.

                                 In sum, a data warehouse is a semantically consistent data store that serves as a
                               physical implementation of a decision support data model. It stores the information
                               an enterprise needs to make strategic decisions. A data warehouse is also often viewed
                               as an architecture, constructed by integrating data from multiple heterogeneous sources
                               to support structured and/or ad hoc queries, analytical reporting, and decision making.
                                 Based on this information, we view data warehousing as the process of construct-
                               ing and using data warehouses. The construction of a data warehouse requires data
                               cleaning, data integration, and data consolidation. The utilization of a data warehouse
                               often necessitates a collection of decision support technologies. This allows “knowledge
                               workers” (e.g., managers, analysts, and executives) to use the warehouse to quickly and
                               conveniently obtain an overview of the data, and to make sound decisions based on
                               information in the warehouse. Some authors use the term data warehousing to refer
                               only to the process of data warehouse construction, while the term warehouse DBMS is
                               used to refer to the management and utilization of data warehouses. We will not make
                               this distinction here.
                                 “How are organizations using the information from data warehouses?” Many orga-
                               nizations use this information to support business decision-making activities, includ-
                               ing (1) increasing customer focus, which includes the analysis of customer buying
                               patterns (such as buying preference, buying time, budget cycles, and appetites for
                               spending); (2) repositioning products and managing product portfolios by compar-
                               ing the performance of sales by quarter, by year, and by geographic regions in order
                               to fine-tune production strategies; (3) analyzing operations and looking for sources of
                               profit; and (4) managing customer relationships, making environmental corrections,
                               and managing the cost of corporate assets.
                                 Data warehousing is also very useful from the point of view of heterogeneous database
                               integration. Organizations typically collect diverse kinds of data and maintain large
                               databases from multiple, heterogeneous, autonomous, and distributed information
                               sources. It is highly desirable, yet challenging, to integrate such data and provide easy
                               and efficient access to it. Much effort has been spent in the database industry and
                               research community toward achieving this goal.
                                 The traditional database approach to heterogeneous database integration is to build
                               wrappers and integrators (or mediators) on top of multiple, heterogeneous databases.
                               When a query is posed to a client site, a metadata dictionary is used to translate the
                               query into queries appropriate for the individual heterogeneous sites involved. These
   159   160   161   162   163   164   165   166   167   168   169