Page 216 -
P. 216

2011/6/1
                                                                                    #55
                                                                     3:17 Page 179
                         HAN
                               11-ch04-125-186-9780123814791
                                                                                   4.6 Summary   179


                                 A data cube consists of a lattice of cuboids, each corresponding to a different degree
                                 of summarization of the given multidimensional data.
                                 Concept hierarchies organize the values of attributes or dimensions into gradual
                                 abstraction levels. They are useful in mining at multiple abstraction levels.
                                 Online analytical processing can be performed in data warehouses/marts using
                                 the multidimensional data model. Typical OLAP operations include roll-up, and
                                 drill-(down, across, through), slice-and-dice, and pivot (rotate), as well as statistical
                                 operations such as ranking and computing moving averages and growth rates. OLAP
                                 operations can be implemented efficiently using the data cube structure.
                                 Data warehouses are used for information processing (querying and reporting),
                                 analytical processing (which allows users to navigate through summarized and
                                 detailed data by OLAP operations), and data mining (which supports knowledge
                                 discovery). OLAP-based data mining is referred to as multidimensional data min-
                                 ing (also known as exploratory multidimensional data mining, online analytical
                                 mining, or OLAM). It emphasizes the interactive and exploratory nature of data
                                 mining.
                                 OLAP servers may adopt a relational OLAP (ROLAP), a multidimensional OLAP
                                 (MOLAP), or a hybrid OLAP (HOLAP) implementation. A ROLAP server uses an
                                 extended relational DBMS that maps OLAP operations on multidimensional data to
                                 standard relational operations. A MOLAP server maps multidimensional data views
                                 directly to array structures. A HOLAP server combines ROLAP and MOLAP. For
                                 example, it may use ROLAP for historic data while maintaining frequently accessed
                                 data in a separate MOLAP store.
                                 Full materialization refers to the computation of all of the cuboids in the lattice
                                 defining a data cube. It typically requires an excessive amount of storage space,
                                 particularly as the number of dimensions and size of associated concept hierarchies
                                 grow. This problem is known as the curse of dimensionality. Alternatively, partial
                                 materialization is the selective computation of a subset of the cuboids or subcubes
                                 in the lattice. For example, an iceberg cube is a data cube that stores only those
                                 cube cells that have an aggregate value (e.g., count) above some minimum support
                                 threshold.
                                 OLAP query processing can be made more efficient with the use of indexing tech-
                                 niques. In bitmap indexing, each attribute has its own bitmap index table. Bitmap
                                 indexing reduces join, aggregation, and comparison operations to bit arithmetic.
                                 Join indexing registers the joinable rows of two or more relations from a relational
                                 database, reducing the overall cost of OLAP join operations. Bitmapped join index-
                                 ing, which combines the bitmap and join index methods, can be used to further
                                 speed up OLAP query processing.
                                 Data generalization is a process that abstracts a large set of task-relevant data
                                 in a database from a relatively low conceptual level to higher conceptual lev-
                                 els. Data generalization approaches include data cube-based data aggregation and
   211   212   213   214   215   216   217   218   219   220   221