Page 255 -
P. 255

12-ch05-187-242-9780123814791
                         HAN

          218   Chapter 5 Data Cube Technology              2011/6/1  3:19 Page 218  #32



                         and E, where AB has the instantiation (a 2 , b 1 ). The fetch of the TID lists for these parti-
                         tions returns (a 2 , b 1 ) : {4, 5}, (c 1 ) : {1, 2, 3, 4, 5} and {(e 1 : {1, 2}), (e 2 : {3, 4}), (e 3 : {5})},
                         respectively. The intersection of these corresponding TID lists contains a cuboid with
                                             5
                         two tuples: {(c 1 , e 2 ) : {4}, (c 1 , e 3 ) : {5}}. This base cuboid can be used to compute the
                         2-D data cube, which is trivial.

                           For large data sets, a fragment size of 2 or 3 typically results in reasonable storage
                         requirements for the shell fragments and for fast query response time. Querying with
                         shell fragments is substantially faster than answering queries using precomputed data
                         cubes that are stored on disk. In comparison to full cube computation, Frag-Shells is
                         recommended if there are less than four inquired dimensions. Otherwise, more efficient
                         algorithms, such as Star-Cubing, can be used for fast online cube computation. Frag-
                         Shells can be easily extended to allow incremental updates, the details of which are left
                         as an exercise.

                 5.3     Processing Advanced Kinds of Queries

                         by Exploring Cube Technology

                         Data cubes are not confined to the simple multidimensional structure illustrated in the
                         last section for typical business data warehouse applications. The methods described in
                         this section further develop data cube technology for effective processing of advanced
                         kinds of queries. Section 5.3.1 explores sampling cubes. This extension of data cube
                         technology can be used to answer queries on sample data, such as survey data, which rep-
                         resent a sample or subset of a target data population of interest. Section 5.3.2 explains
                         how ranking cubes can be computed to answer top-k queries, such as “find the top 5
                         cars,” according to some user-specified criteria.
                           The basic data cube structure has been further extended for various sophisticated
                         data types and new applications. Here we list some examples, such as spatial data cubes
                         for the design and implementation of geospatial data warehouses, and multimedia data
                         cubes for the multidimensional analysis of multimedia data (those containing images
                         and videos). RFID data cubes handle the compression and multidimensional analy-
                         sis of RFID (i.e., radio-frequency identification) data. Text cubes and topic cubes were
                         developed for the application of vector-space models and generative language models,
                         respectively, in the analysis of multidimensional text databases (which contain both
                         structure attributes and narrative text attributes).

                   5.3.1 Sampling Cubes: OLAP-Based Mining
                         on Sampling Data
                         When collecting data, we often collect only a subset of the data we would ideally like
                         to gather. In statistics, this is known as collecting a sample of the data population.


                         5 That is, the intersection of the TID lists for (a 2 , b 1 ), (c 1 ), and (e 2 ) is {4}.
   250   251   252   253   254   255   256   257   258   259   260