Page 162 -
P. 162
2011/6/1
Page 125
3:17
HAN 11-ch04-125-186-9780123814791
#1
4
Data Warehousing and Online
Analytical Processing
Data warehouses generalize and consolidate data in multidimensional space. The construction
of data warehouses involves data cleaning, data integration, and data transformation,
and can be viewed as an important preprocessing step for data mining. Moreover, data
warehouses provide online analytical processing (OLAP) tools for the interactive analysis
of multidimensional data of varied granularities, which facilitates effective data gene-
ralization and data mining. Many other data mining functions, such as association,
classification, prediction, and clustering, can be integrated with OLAP operations to
enhance interactive mining of knowledge at multiple levels of abstraction. Hence, the
data warehouse has become an increasingly important platform for data analysis and
OLAP and will provide an effective platform for data mining. Therefore, data warehous-
ing and OLAP form an essential step in the knowledge discovery process. This chapter
presents an overview of data warehouse and OLAP technology. This overview is essential
for understanding the overall data mining and knowledge discovery process.
In this chapter, we study a well-accepted definition of the data warehouse and see
why more and more organizations are building data warehouses for the analysis of
their data (Section 4.1). In particular, we study the data cube, a multidimensional data
model for data warehouses and OLAP, as well as OLAP operations such as roll-up, drill-
down, slicing, and dicing (Section 4.2). We also look at data warehouse design and
usage (Section 4.3). In addition, we discuss multidimensional data mining, a power-
ful paradigm that integrates data warehouse and OLAP technology with that of data
mining. An overview of data warehouse implementation examines general strategies
for efficient data cube computation, OLAP data indexing, and OLAP query process-
ing (Section 4.4). Finally, we study data generalization by attribute-oriented induction
(Section 4.5). This method uses concept hierarchies to generalize data to multiple levels
of abstraction.
4.1 Data Warehouse: Basic Concepts
This section gives an introduction to data warehouses. We begin with a definition of the
data warehouse (Section 4.1.1). We outline the differences between operational database
Data Mining: Concepts and Techniques 125
c
2012 Elsevier Inc. All rights reserved.