Page 276 -
P. 276

#53
                                                                     3:19 Page 239
                                                            2011/6/1
                         HAN
                               12-ch05-187-242-9780123814791
                                                                                    5.6 Exercises  239


                               (c) Discuss how to support quality drill-down given that some low-level cells may be
                                  empty or contain too few data for reliable analysis.
                          5.13 The ranking cube was proposed for efficient computation of top-k (ranking) queries in
                               relational databases. Recently, researchers have proposed another kind of query, called a
                               skyline query. A skyline query returns all the objects p i such that p i is not dominated by any
                               other object p j , where dominance is defined as follows. Let the value of p i on dimension
                               d be v(p i ,d). We say p i is dominated by p j if and only if for each preference dimension
                               d, v(p j ,d) ≤ v(p i ,d), and there is at least one d where the equality does not hold.
                               (a) Design a ranking cube so that skyline queries can be processed efficiently.
                              (b) Skyline queries are sometimes too strict to be desirable to some users. One may
                                  generalize the concept of skyline into generalized skyline as follows: Given a d-
                                  dimensional database and a query q, the generalized skyline is the set of the following
                                  objects: (1) the skyline objects and (2) the nonskyline objects that are  -neighbors of a
                                  skyline object, where r is an  -neighbor of an object p if the distance between p and
                                  r is no more than  . Design a ranking cube to process generalized skyline queries
                                  efficiently.
                          5.14 The ranking cube was designed to support top-k (ranking) queries in relational database
                               systems. However, ranking queries are also posed to data warehouses, where ranking is
                               on multidimensional aggregates instead of on measures of base facts. For example, con-
                               sider a product manager who is analyzing a sales database that stores the nationwide
                               sales history, organized by location and time. To make investment decisions, the man-
                               ager may pose the following query: “What are the top-10 (state, year) cells having the
                               largest total product sales?” He may further drill down and ask, “What are the top-10 (city,
                               month) cells?” Suppose the system can perform such partial materialization to derive two
                               types of materialized cuboids: a guiding cuboid and a supporting cuboid, where the for-
                               mer contains a number of guiding cells that provide concise, high-level data statistics
                               to guide the ranking query processing, whereas the latter provides inverted indices for
                               efficient online aggregation.
                               (a) Derive an efficient method for computing such aggregate ranking cubes.
                              (b) Extend your framework to handle more advanced measures. One such example
                                  could be as follows. Consider an organization donation database, where donors
                                  are grouped by “age,” “income,” and other attributes. Interesting questions include:
                                  “Which age and income groups have made the top-k average amount of donation (per
                                  donor)?” and “Which income group of donors has the largest standard deviation in the
                                  donation amount?”
                          5.15 The prediction cube is a good example of multidimensional data mining in cube
                               space.
                               (a) Propose an efficient algorithm that computes prediction cubes in a given multidi-
                                  mensional database.
                              (b) For what kind of classification models can your algorithm be applied? Explain.
   271   272   273   274   275   276   277   278   279   280   281