Page 263 -
P. 263

12-ch05-187-242-9780123814791
                         HAN

          226   Chapter 5 Data Cube Technology              2011/6/1  3:19 Page 226  #40



                         district, the star rating, and whether the hotel offers complimentary treats or Internet
                         access. The ranking functions may be linear, quadratic, or any other form.
                           As shown in the preceding examples, individual users may not only propose ad hoc
                         ranking functions, but also have different data subsets of interest. Users often want to
                         thoroughly study the data via multidimensional analysis of the top-k query results. For
                         example, if unsatisfied by the top-5 results returned by Q 1 , the user may roll up on
                         the producer dimension to check the top-5 results on all sedans. The dynamic nature
                         of the problem imposes a great challenge to researchers. OLAP requires offline pre-
                         computation so that multidimensional analysis can be performed on-the-fly, yet the ad
                         hoc ranking functions prohibit full materialization. A natural compromise is to adopt a
                         semi-offline materialization and semi-online computation model.
                           Suppose a relation R has selection dimensions (A 1 ,A 2 , ...,A S ) and ranking dimen-
                         sions (N 1 ,N 2 ,...,N R ). Values in each ranking dimension can be partitioned into multi-
                         ple intervals according to the data and expected query distributions. Regarding the price
                         of used cars, for example, we may have, say, these four partitions (or value ranges): ≤ 5K,
                         [5 − 10K), [10 − 15K), and ≥ 15K. A ranking cube can be constructed by performing
                         multidimensional aggregations on selection dimensions. We can store the count for each
                         partition of each ranking dimension, thereby making the cube “rank-aware.” The top-k
                         queries can be answered by first accessing the cells in the more preferred value ranges
                         before consulting the cells in the less preferred value ranges.


           Example 5.17 Using a ranking cube to answer a top-k query. Suppose Table 5.11 shows C MT , a mate-
                         rialized (i.e., precomputed) cuboid of a ranking cube for used-car sales. The cuboid,
                         C MT , is for the selection dimensions producer and type. It shows the count and corre-
                         sponding tuple IDs (TIDs) for various partitions of the ranking dimensions, price and
                         mileage.
                           Query Q 1 can be answered by using a selection condition to select the appropriate
                         selection dimension values (i.e., producer = “Ford” and type = “sedan”) in cuboid C MT .
                                                                                2
                                                                2
                         In addition, the ranking function “(price − 10K) + (mileage − 30K) ” is used to find
                         the tuples that most closely match the user’s criteria. If there are not enough matching
                         tuples found in the closest matching cells, the next closest matching cells will need to be
                         accessed. We may even drill down to the corresponding lower-level cells to see the count
                         distributions of cells that match the ranking function and additional criteria regarding,
                         say, model, maintenance situation, or other loaded features. Only users who really want
                         to see more detailed information, such as interior photos, will need to access the physical
                         records stored in the database.


              Table 5.11 Cuboid of a Ranking Cube for Used-Car Sales
                          producer  type   price  mileage  count    TIDs
                            Ford   sedan   <5K    30–40K     7     t 6 , ..., t 68
                            Ford   sedan   5–10K  30–40K    50    t 15 , ..., t 152
                           Honda   sedan  10–15K  30–40K    20     t 8 , ..., t 32
                            ...     ...     ...     ...     ...      ...
   258   259   260   261   262   263   264   265   266   267   268