Page 197 -
P. 197

11-ch04-125-186-9780123814791
                         HAN
                                                            2011/6/1
          160   Chapter 4 Data Warehousing and Online Analytical Processing  3:17 Page 160  #36



                         The partial materialization of cuboids or subcubes should consider three factors: (1)
                         identify the subset of cuboids or subcubes to materialize; (2) exploit the materialized
                         cuboids or subcubes during query processing; and (3) efficiently update the materialized
                         cuboids or subcubes during load and refresh.
                           The selection of the subset of cuboids or subcubes to materialize should take into
                         account the queries in the workload, their frequencies, and their accessing costs. In addi-
                         tion, it should consider workload characteristics, the cost for incremental updates, and
                         the total storage requirements. The selection must also consider the broad context of
                         physical database design such as the generation and selection of indices. Several OLAP
                         products have adopted heuristic approaches for cuboid and subcube selection. A pop-
                         ular approach is to materialize the cuboids set on which other frequently referenced
                         cuboids are based. Alternatively, we can compute an iceberg cube, which is a data cube
                         that stores only those cube cells with an aggregate value (e.g., count) that is above some
                         minimum support threshold.
                           Another common strategy is to materialize a shell cube. This involves precomput-
                         ing the cuboids for only a small number of dimensions (e.g., three to five) of a data
                         cube. Queries on additional combinations of the dimensions can be computed on-the-
                         fly. Because our aim in this chapter is to provide a solid introduction and overview of
                         data warehousing for data mining, we defer our detailed discussion of cuboid selection
                         and computation to Chapter 5, which studies various data cube computation methods
                         in greater depth.
                           Once the selected cuboids have been materialized, it is important to take advantage of
                         them during query processing. This involves several issues, such as how to determine the
                         relevant cuboid(s) from among the candidate materialized cuboids, how to use available
                         index structures on the materialized cuboids, and how to transform the OLAP opera-
                         tions onto the selected cuboid(s). These issues are discussed in Section 4.4.3 as well as in
                         Chapter 5.
                           Finally, during load and refresh, the materialized cuboids should be updated effi-
                         ciently. Parallelism and incremental update techniques for this operation should be
                         explored.

                   4.4.2 Indexing OLAP Data: Bitmap Index and Join Index
                         To facilitate efficient data accessing, most data warehouse systems support index struc-
                         tures and materialized views (using cuboids). General methods to select cuboids for
                         materialization were discussed in Section 4.4.1. In this subsection, we examine how to
                         index OLAP data by bitmap indexing and join indexing.
                           The bitmap indexing method is popular in OLAP products because it allows quick
                         searching in data cubes. The bitmap index is an alternative representation of the
                         record ID (RID) list. In the bitmap index for a given attribute, there is a distinct bit
                         vector, Bv, for each value v in the attribute’s domain. If a given attribute’s domain con-
                         sists of n values, then n bits are needed for each entry in the bitmap index (i.e., there are
                         n bit vectors). If the attribute has the value v for a given row in the data table, then the
                         bit representing that value is set to 1 in the corresponding row of the bitmap index. All
                         other bits for that row are set to 0.
   192   193   194   195   196   197   198   199   200   201   202