Page 215 -

P. 215

2011/6/1
HAN
11-ch04-125-186-9780123814791
178 Chapter 4 Data Warehousing and Online Analytical Processing 3:17 Page 178 #54

OLAP operations may be performed on the target and contrasting classes as deemed
necessary by the user in order to adjust the abstraction levels of the ﬁnal description.

In summary, attribute-oriented induction for data characterization and generaliza-
tion provides an alternative data generalization method in comparison to the data cube
approach. It is not conﬁned to relational data because such an induction can be per-
formed on spatial, multimedia, sequence, and other kinds of data sets. In addition, there
is no need to precompute a data cube because generalization can be performed online
upon receiving a user’s query.
Moreover, automated analysis can be added to such an induction process to auto-
matically ﬁlter out irrelevant or unimportant attributes. However, because attribute-
oriented induction automatically generalizes data to a higher level, it cannot efﬁciently
support the process of drilling down to levels deeper than those provided in the general-
ized relation. The integration of data cube technology with attribute-oriented induction
may provide a balance between precomputation and online computation. This would
also support fast online computation when it is necessary to drill down to a level deeper
than that provided in the generalized relation.

4.6 Summary

A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile data
collection organized in support of management decision making. Several factors
distinguish data warehouses from operational databases. Because the two systems
provide quite different functionalities and require different kinds of data, it is
necessary to maintain data warehouses separately from operational databases.
Data warehouses often adopt a three-tier architecture. The bottom tier is a ware-
house database server, which is typically a relational database system. The middle tier
is an OLAP server, and the top tier is a client that contains query and reporting tools.
A data warehouse contains back-end tools and utilities for populating and refresh-
ing the warehouse. These cover data extraction, data cleaning, data transformation,
loading, refreshing, and warehouse management.
Data warehouse metadata are data deﬁning the warehouse objects. A metadata
repository provides details regarding the warehouse structure, data history, the algo-
rithms used for summarization, mappings from the source data to the warehouse
form, system performance, and business terms and issues.
A multidimensional data model is typically used for the design of corporate data
warehouses and departmental data marts. Such a model can adopt a star schema,
snowﬂake schema, or fact constellation schema. The core of the multidimensional
model is the data cube, which consists of a large set of facts (or measures) and a
number of dimensions. Dimensions are the entities or perspectives with respect to
which an organization wants to keep records and are hierarchical in nature.

210 211 212 213 214 215 216 217 218 219 220