Page 162 -
P. 162

2011/6/1
                                                                           Page 125
                                                                     3:17
                          HAN 11-ch04-125-186-9780123814791
                                                                                    #1






                                                                                4

                                Data Warehousing and Online


                                                     Analytical Processing







                     Data warehouses generalize and consolidate data in multidimensional space. The construction
                               of data warehouses involves data cleaning, data integration, and data transformation,
                               and can be viewed as an important preprocessing step for data mining. Moreover, data
                               warehouses provide online analytical processing (OLAP) tools for the interactive analysis
                               of multidimensional data of varied granularities, which facilitates effective data gene-
                               ralization and data mining. Many other data mining functions, such as association,
                               classification, prediction, and clustering, can be integrated with OLAP operations to
                               enhance interactive mining of knowledge at multiple levels of abstraction. Hence, the
                               data warehouse has become an increasingly important platform for data analysis and
                               OLAP and will provide an effective platform for data mining. Therefore, data warehous-
                               ing and OLAP form an essential step in the knowledge discovery process. This chapter
                               presents an overview of data warehouse and OLAP technology. This overview is essential
                               for understanding the overall data mining and knowledge discovery process.
                                 In this chapter, we study a well-accepted definition of the data warehouse and see
                               why more and more organizations are building data warehouses for the analysis of
                               their data (Section 4.1). In particular, we study the data cube, a multidimensional data
                               model for data warehouses and OLAP, as well as OLAP operations such as roll-up, drill-
                               down, slicing, and dicing (Section 4.2). We also look at data warehouse design and
                               usage (Section 4.3). In addition, we discuss multidimensional data mining, a power-
                               ful paradigm that integrates data warehouse and OLAP technology with that of data
                               mining. An overview of data warehouse implementation examines general strategies
                               for efficient data cube computation, OLAP data indexing, and OLAP query process-
                               ing (Section 4.4). Finally, we study data generalization by attribute-oriented induction
                               (Section 4.5). This method uses concept hierarchies to generalize data to multiple levels
                               of abstraction.

                       4.1     Data Warehouse: Basic Concepts



                               This section gives an introduction to data warehouses. We begin with a definition of the
                               data warehouse (Section 4.1.1). We outline the differences between operational database

                               Data Mining: Concepts and Techniques                              125
                               c 
 2012 Elsevier Inc. All rights reserved.
   157   158   159   160   161   162   163   164   165   166   167