Page 216 -
P. 216
2011/6/1
#55
3:17 Page 179
HAN
11-ch04-125-186-9780123814791
4.6 Summary 179
A data cube consists of a lattice of cuboids, each corresponding to a different degree
of summarization of the given multidimensional data.
Concept hierarchies organize the values of attributes or dimensions into gradual
abstraction levels. They are useful in mining at multiple abstraction levels.
Online analytical processing can be performed in data warehouses/marts using
the multidimensional data model. Typical OLAP operations include roll-up, and
drill-(down, across, through), slice-and-dice, and pivot (rotate), as well as statistical
operations such as ranking and computing moving averages and growth rates. OLAP
operations can be implemented efficiently using the data cube structure.
Data warehouses are used for information processing (querying and reporting),
analytical processing (which allows users to navigate through summarized and
detailed data by OLAP operations), and data mining (which supports knowledge
discovery). OLAP-based data mining is referred to as multidimensional data min-
ing (also known as exploratory multidimensional data mining, online analytical
mining, or OLAM). It emphasizes the interactive and exploratory nature of data
mining.
OLAP servers may adopt a relational OLAP (ROLAP), a multidimensional OLAP
(MOLAP), or a hybrid OLAP (HOLAP) implementation. A ROLAP server uses an
extended relational DBMS that maps OLAP operations on multidimensional data to
standard relational operations. A MOLAP server maps multidimensional data views
directly to array structures. A HOLAP server combines ROLAP and MOLAP. For
example, it may use ROLAP for historic data while maintaining frequently accessed
data in a separate MOLAP store.
Full materialization refers to the computation of all of the cuboids in the lattice
defining a data cube. It typically requires an excessive amount of storage space,
particularly as the number of dimensions and size of associated concept hierarchies
grow. This problem is known as the curse of dimensionality. Alternatively, partial
materialization is the selective computation of a subset of the cuboids or subcubes
in the lattice. For example, an iceberg cube is a data cube that stores only those
cube cells that have an aggregate value (e.g., count) above some minimum support
threshold.
OLAP query processing can be made more efficient with the use of indexing tech-
niques. In bitmap indexing, each attribute has its own bitmap index table. Bitmap
indexing reduces join, aggregation, and comparison operations to bit arithmetic.
Join indexing registers the joinable rows of two or more relations from a relational
database, reducing the overall cost of OLAP join operations. Bitmapped join index-
ing, which combines the bitmap and join index methods, can be used to further
speed up OLAP query processing.
Data generalization is a process that abstracts a large set of task-relevant data
in a database from a relatively low conceptual level to higher conceptual lev-
els. Data generalization approaches include data cube-based data aggregation and