Page 253 -
P. 253

HAN
                               12-ch05-187-242-9780123814791

          216   Chapter 5 Data Cube Technology              2011/6/1  3:19 Page 216  #30



                           The shell fragments are negligible in both storage space and computation time in
                         comparison with the full data cube. Note that we can also use the Frag-Shells algorithm
                         to compute the full data cube by including all the dimensions as a single fragment.
                         Because the order of computation with respect to the cuboid lattice is top-down and
                         depth-first (similar to that of BUC), the algorithm can perform Apriori pruning if
                         applied to the construction of iceberg cubes.
                           “Once we have computed the shell fragments, how can they be used to answer OLAP
                         queries?” Given the precomputed shell fragments, we can view the cube space as a virtual
                         cube and perform OLAP queries related to the cube online. In general, two types of
                         queries are possible: (1) point query and (2) subcube query.
                           In a point query, all of the relevant dimensions in the cube have been instantiated
                         (i.e., there are no inquired dimensions in the relevant dimensions set). For example,
                         in an n-dimensional data cube, A 1 A 2 ...A n , a point query could be in the form of
                         hA 1 , A 5 , A 9 : M?i, where A 1 = {a 11 , a 18 }, A 5 = {a 52 , a 55 , a 59 }, A 9 = a 94 , and M is the
                         inquired measure for each corresponding cube cell. For a cube with a small number
                         of dimensions, we can use ∗ to represent a “don’t care” position where the correspond-
                         ing dimension is irrelevant, that is, neither inquired nor instantiated. For example, in the
                         query ha 2 , b 1 , c 1 , d 1 , ∗ :count()?i for the database in Table 5.4, the first four dimension
                         values are instantiated to a 2 , b 1 , c 1 , and d 1 , respectively, while the last dimension is
                         irrelevant, and count() (which is the tuple count by context) is the inquired measure.
                           In a subcube query, at least one of the relevant dimensions in the cube is inquired.
                         For example, in an n-dimensional data cube A 1 A 2 ...A n , a subcube query could be in the
                         form hA 1 , A 5 ?, A 9 , A 21 ? : M?i, where A 1 = {a 11 , a 18 } and A 9 = a 94 , A 5 and A 21 are the
                         inquired dimensions, and M is the inquired measure. For a cube with a small number
                         of dimensions, we can use ∗ for an irrelevant dimension and ? for an inquired one. For
                         example, in the query ha 2 , ?, c 1 , ∗ , ? : count() ?i we see that the first and third dimension
                         values are instantiated to a 2 and c 1 , respectively, while the fourth is irrelevant, and the
                         second and the fifth are inquired. A subcube query computes all possible value combina-
                         tions of the inquired dimensions. It essentially returns a local data cube consisting of the
                         inquired dimensions.
                           “How can we use shell fragments to answer a point query?” Because a point query
                         explicitly provides the instantiated variables set on the relevant dimensions set, we can
                         make maximal use of the precomputed shell fragments by finding the best fitting (i.e.,
                         dimension-wise completely matching) fragments to fetch and intersect the associated
                         TID lists.
                           Let the point query be of the form hα i , α j , α k , α p : M?i, where α i represents a set of
                         instantiated values of dimension A i , and so on for α j , α k , and α p . First, we check the
                         shell fragment schema to determine which dimensions among A i , A j , A k , and A p are in
                         the same fragment(s). Suppose A i and A j are in the same fragment, while A k and A p are
                         in two other fragments. We fetch the corresponding TID lists on the precomputed 2-D
                         fragment for dimensions A i and A j using the instantiations α i and α j , and fetch the TID
                         lists on the 1-D fragments for dimensions A k and A p using the instantiations α k and α p ,
                         respectively. The obtained TID lists are intersected to derive the TID list table. This table
                         is then used to derive the specified measure (e.g., by taking the length of the TID lists
   248   249   250   251   252   253   254   255   256   257   258