Page 254 -
P. 254

12-ch05-187-242-9780123814791
                                                            2011/6/1
                         HAN
                                                                     3:19 Page 217
                                                                                    #31
                                                               5.2 Data Cube Computation Methods  217


                               for tuple count(), or by fetching item count() and sum() from the ID measure array to
                               compute average()) for the final set of cells.

                 Example 5.12 Point query. Suppose a user wants to compute the point query ha 2 , b 1 , c 1 , d 1 , ∗: count()?i
                               for our database in Table 5.4 and that the shell fragments for the partitions (A, B, C)
                               and (D, E) are precomputed as described in Example 5.10. The query is broken down
                               into two subqueries based on the precomputed fragments: ha 2 , b 1 , c 1 , ∗ , ∗i and h∗, ∗ ,
                               ∗ , d 1 , ∗i. The best-fit precomputed shell fragments for the two subqueries are ABC and
                               D. The fetch of the TID lists for the two subqueries returns two lists: {4, 5} and {1, 3,
                               4, 5}. Their intersection is the list {4, 5}, which is of size 2. Thus, the final answer is
                               count() = 2.

                                 “How can we use shell fragments to answer a subcube query?” A subcube query returns
                               a local data cube based on the instantiated and inquired dimensions. Such a data cube
                               needs to be aggregated in a multidimensional way so that online analytical processing
                               (drilling, dicing, pivoting, etc.) can be made available to users for flexible manipulation
                               and analysis. Because instantiated dimensions usually provide highly selective constants
                               that dramatically reduce the size of the valid TID lists, we should make maximal use of
                               the precomputed shell fragments by finding the fragments that best fit the set of instan-
                               tiated dimensions, and fetching and intersecting the associated TID lists to derive the
                               reduced TID list. This list can then be used to intersect the best-fitting shell fragments
                               consisting of the inquired dimensions. This will generate the relevant and inquired base
                               cuboid, which can then be used to compute the relevant subcube on-the-fly using an
                               efficient online cubing algorithm.
                                 Let the subcube query be of the form hα i , α j , A k ?, α p , A q ? : M?i, where α i , α j , and
                               α p represent a set of instantiated values of dimension A i , A j , and A p , respectively, and A k
                               and A q represent two inquired dimensions. First, we check the shell fragment schema
                               to determine which dimensions among (1) A i , A j , and A p , and (2) A k and A q are in
                               the same fragment partition. Suppose A i and A j belong to the same fragment, as do A k
                               and A q , but that A p is in a different fragment. We fetch the corresponding TID lists in
                               the precomputed 2-D fragment for A i and A j using the instantiations α i and α j , then
                               fetch the TID list on the precomputed 1-D fragment for A p using instantiation α p , and
                               then fetch the TID lists on the precomputed 2-D fragments for A k and A q , respectively,
                               using no instantiations (i.e., all possible values). The obtained TID lists are intersected
                               to derive the final TID lists, which are used to fetch the corresponding measures from
                               the ID measure array to derive the “base cuboid” of a 2-D subcube for two dimensions
                               (A k , A q ). A fast cube computation algorithm can be applied to compute this 2-D cube
                               based on the derived base cuboid. The computed 2-D cube is then ready for OLAP
                               operations.

                 Example 5.13 Subcube query. Suppose that a user wants to compute the subcube query, ha 2 , b 1 , ?, ∗
                               , ? : count()?i, for our database shown earlier in Table 5.4, and that the shell fragments
                               have been precomputed as described in Example 5.10. The query can be broken into
                               three best-fit fragments according to the instantiated and inquired dimensions: AB, C,
   249   250   251   252   253   254   255   256   257   258   259