Page 215 -
P. 215
2011/6/1
HAN
11-ch04-125-186-9780123814791
178 Chapter 4 Data Warehousing and Online Analytical Processing 3:17 Page 178 #54
OLAP operations may be performed on the target and contrasting classes as deemed
necessary by the user in order to adjust the abstraction levels of the final description.
In summary, attribute-oriented induction for data characterization and generaliza-
tion provides an alternative data generalization method in comparison to the data cube
approach. It is not confined to relational data because such an induction can be per-
formed on spatial, multimedia, sequence, and other kinds of data sets. In addition, there
is no need to precompute a data cube because generalization can be performed online
upon receiving a user’s query.
Moreover, automated analysis can be added to such an induction process to auto-
matically filter out irrelevant or unimportant attributes. However, because attribute-
oriented induction automatically generalizes data to a higher level, it cannot efficiently
support the process of drilling down to levels deeper than those provided in the general-
ized relation. The integration of data cube technology with attribute-oriented induction
may provide a balance between precomputation and online computation. This would
also support fast online computation when it is necessary to drill down to a level deeper
than that provided in the generalized relation.
4.6 Summary
A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile data
collection organized in support of management decision making. Several factors
distinguish data warehouses from operational databases. Because the two systems
provide quite different functionalities and require different kinds of data, it is
necessary to maintain data warehouses separately from operational databases.
Data warehouses often adopt a three-tier architecture. The bottom tier is a ware-
house database server, which is typically a relational database system. The middle tier
is an OLAP server, and the top tier is a client that contains query and reporting tools.
A data warehouse contains back-end tools and utilities for populating and refresh-
ing the warehouse. These cover data extraction, data cleaning, data transformation,
loading, refreshing, and warehouse management.
Data warehouse metadata are data defining the warehouse objects. A metadata
repository provides details regarding the warehouse structure, data history, the algo-
rithms used for summarization, mappings from the source data to the warehouse
form, system performance, and business terms and issues.
A multidimensional data model is typically used for the design of corporate data
warehouses and departmental data marts. Such a model can adopt a star schema,
snowflake schema, or fact constellation schema. The core of the multidimensional
model is the data cube, which consists of a large set of facts (or measures) and a
number of dimensions. Dimensions are the entities or perspectives with respect to
which an organization wants to keep records and are hierarchical in nature.