Page 255 -
P. 255
12-ch05-187-242-9780123814791
HAN
218 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 218 #32
and E, where AB has the instantiation (a 2 , b 1 ). The fetch of the TID lists for these parti-
tions returns (a 2 , b 1 ) : {4, 5}, (c 1 ) : {1, 2, 3, 4, 5} and {(e 1 : {1, 2}), (e 2 : {3, 4}), (e 3 : {5})},
respectively. The intersection of these corresponding TID lists contains a cuboid with
5
two tuples: {(c 1 , e 2 ) : {4}, (c 1 , e 3 ) : {5}}. This base cuboid can be used to compute the
2-D data cube, which is trivial.
For large data sets, a fragment size of 2 or 3 typically results in reasonable storage
requirements for the shell fragments and for fast query response time. Querying with
shell fragments is substantially faster than answering queries using precomputed data
cubes that are stored on disk. In comparison to full cube computation, Frag-Shells is
recommended if there are less than four inquired dimensions. Otherwise, more efficient
algorithms, such as Star-Cubing, can be used for fast online cube computation. Frag-
Shells can be easily extended to allow incremental updates, the details of which are left
as an exercise.
5.3 Processing Advanced Kinds of Queries
by Exploring Cube Technology
Data cubes are not confined to the simple multidimensional structure illustrated in the
last section for typical business data warehouse applications. The methods described in
this section further develop data cube technology for effective processing of advanced
kinds of queries. Section 5.3.1 explores sampling cubes. This extension of data cube
technology can be used to answer queries on sample data, such as survey data, which rep-
resent a sample or subset of a target data population of interest. Section 5.3.2 explains
how ranking cubes can be computed to answer top-k queries, such as “find the top 5
cars,” according to some user-specified criteria.
The basic data cube structure has been further extended for various sophisticated
data types and new applications. Here we list some examples, such as spatial data cubes
for the design and implementation of geospatial data warehouses, and multimedia data
cubes for the multidimensional analysis of multimedia data (those containing images
and videos). RFID data cubes handle the compression and multidimensional analy-
sis of RFID (i.e., radio-frequency identification) data. Text cubes and topic cubes were
developed for the application of vector-space models and generative language models,
respectively, in the analysis of multidimensional text databases (which contain both
structure attributes and narrative text attributes).
5.3.1 Sampling Cubes: OLAP-Based Mining
on Sampling Data
When collecting data, we often collect only a subset of the data we would ideally like
to gather. In statistics, this is known as collecting a sample of the data population.
5 That is, the intersection of the TID lists for (a 2 , b 1 ), (c 1 ), and (e 2 ) is {4}.