Page 253 -
P. 253
HAN
12-ch05-187-242-9780123814791
216 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 216 #30
The shell fragments are negligible in both storage space and computation time in
comparison with the full data cube. Note that we can also use the Frag-Shells algorithm
to compute the full data cube by including all the dimensions as a single fragment.
Because the order of computation with respect to the cuboid lattice is top-down and
depth-first (similar to that of BUC), the algorithm can perform Apriori pruning if
applied to the construction of iceberg cubes.
“Once we have computed the shell fragments, how can they be used to answer OLAP
queries?” Given the precomputed shell fragments, we can view the cube space as a virtual
cube and perform OLAP queries related to the cube online. In general, two types of
queries are possible: (1) point query and (2) subcube query.
In a point query, all of the relevant dimensions in the cube have been instantiated
(i.e., there are no inquired dimensions in the relevant dimensions set). For example,
in an n-dimensional data cube, A 1 A 2 ...A n , a point query could be in the form of
hA 1 , A 5 , A 9 : M?i, where A 1 = {a 11 , a 18 }, A 5 = {a 52 , a 55 , a 59 }, A 9 = a 94 , and M is the
inquired measure for each corresponding cube cell. For a cube with a small number
of dimensions, we can use ∗ to represent a “don’t care” position where the correspond-
ing dimension is irrelevant, that is, neither inquired nor instantiated. For example, in the
query ha 2 , b 1 , c 1 , d 1 , ∗ :count()?i for the database in Table 5.4, the first four dimension
values are instantiated to a 2 , b 1 , c 1 , and d 1 , respectively, while the last dimension is
irrelevant, and count() (which is the tuple count by context) is the inquired measure.
In a subcube query, at least one of the relevant dimensions in the cube is inquired.
For example, in an n-dimensional data cube A 1 A 2 ...A n , a subcube query could be in the
form hA 1 , A 5 ?, A 9 , A 21 ? : M?i, where A 1 = {a 11 , a 18 } and A 9 = a 94 , A 5 and A 21 are the
inquired dimensions, and M is the inquired measure. For a cube with a small number
of dimensions, we can use ∗ for an irrelevant dimension and ? for an inquired one. For
example, in the query ha 2 , ?, c 1 , ∗ , ? : count() ?i we see that the first and third dimension
values are instantiated to a 2 and c 1 , respectively, while the fourth is irrelevant, and the
second and the fifth are inquired. A subcube query computes all possible value combina-
tions of the inquired dimensions. It essentially returns a local data cube consisting of the
inquired dimensions.
“How can we use shell fragments to answer a point query?” Because a point query
explicitly provides the instantiated variables set on the relevant dimensions set, we can
make maximal use of the precomputed shell fragments by finding the best fitting (i.e.,
dimension-wise completely matching) fragments to fetch and intersect the associated
TID lists.
Let the point query be of the form hα i , α j , α k , α p : M?i, where α i represents a set of
instantiated values of dimension A i , and so on for α j , α k , and α p . First, we check the
shell fragment schema to determine which dimensions among A i , A j , A k , and A p are in
the same fragment(s). Suppose A i and A j are in the same fragment, while A k and A p are
in two other fragments. We fetch the corresponding TID lists on the precomputed 2-D
fragment for dimensions A i and A j using the instantiations α i and α j , and fetch the TID
lists on the 1-D fragments for dimensions A k and A p using the instantiations α k and α p ,
respectively. The obtained TID lists are intersected to derive the TID list table. This table
is then used to derive the specified measure (e.g., by taking the length of the TID lists