Page 262 -
P. 262

2011/6/1
                                                                                    #39
                                                                     3:19 Page 225
                               12-ch05-187-242-9780123814791
                         HAN
                                   5.3 Processing Advanced Kinds of Queries by Exploring Cube Technology  225


                               The subset should consist of relatively low-dimensional cuboids (that are commonly
                               queried) and cuboids that offer the most benefit to the user. The details are left to inter-
                               ested readers as an exercise. The method was tested on both real and synthetic data and
                               found to be efficient and effective in answering queries.



                         5.3.2 Ranking Cubes: Efficient Computation of Top-k Queries
                               The data cube helps not only online analytical processing of multidimensional queries
                               but also search and data mining. In this section, we introduce a new cube structure
                               called Ranking Cube and examine how it contributes to the efficient processing of top-k
                               queries. Instead of returning a large set of indiscriminative answers to a query, a top-k
                               query (or ranking query) returns only the best k results according to a user-specified
                               preference.
                                 The results are returned in ranked order so that the best is at the top. The user-
                               specified preference generally consists of two components: a selection condition and
                               a ranking function. Top-k queries are common in many applications like searching
                               web databases, k-nearest-neighbor searches with approximate matches, and similarity
                               queries in multimedia databases.

                 Example 5.16 A top-k query. Consider an online used-car database, R, that maintains the following
                               information for each car: producer (e.g., Ford, Honda), model (e.g., Taurus, Accord),
                               type (e.g., sedan, convertible), color (e.g., red, silver), transmission (e.g., auto, manual),
                               price, mileage, and so on. A typical top-k query over this database is
                                  Q 1 :  select top 5 * from R
                                       where producer = “Ford” and type = “sedan”
                                                                         2
                                                         2
                                       order by (price − 10K) + (mileage − 30K) asc
                               Within the dimensions (or attributes) for R, producer and type are used here as selection
                               dimensions. The ranking function is given in the order-by clause. It specifies the rank-
                               ing dimensions, price and mileage. Q 1 searches for the top-5 sedans made by Ford. The
                               entries found are ranked or sorted in ascending (asc) order, according to the ranking
                               function. The ranking function is formulated so that entries that have price and mileage
                               closest to the user’s specified values of $10K and 30K, respectively, appear toward the
                               top of the list.

                                 The database may have many dimensions that could be used for selection, describ-
                               ing, for example, whether a car has power windows, air conditioning, or a sunroof. Users
                               may pick any subset of dimensions and issue a top-k query using their preferred rank-
                               ing function. There are many other similar application scenarios. For example, when
                               searching for hotels, ranking functions are often constructed based on price and distance
                               to an area of interest. Selection conditions can be imposed on, say, the hotel location
   257   258   259   260   261   262   263   264   265   266   267