Page 276 -
P. 276
#53
3:19 Page 239
2011/6/1
HAN
12-ch05-187-242-9780123814791
5.6 Exercises 239
(c) Discuss how to support quality drill-down given that some low-level cells may be
empty or contain too few data for reliable analysis.
5.13 The ranking cube was proposed for efficient computation of top-k (ranking) queries in
relational databases. Recently, researchers have proposed another kind of query, called a
skyline query. A skyline query returns all the objects p i such that p i is not dominated by any
other object p j , where dominance is defined as follows. Let the value of p i on dimension
d be v(p i ,d). We say p i is dominated by p j if and only if for each preference dimension
d, v(p j ,d) ≤ v(p i ,d), and there is at least one d where the equality does not hold.
(a) Design a ranking cube so that skyline queries can be processed efficiently.
(b) Skyline queries are sometimes too strict to be desirable to some users. One may
generalize the concept of skyline into generalized skyline as follows: Given a d-
dimensional database and a query q, the generalized skyline is the set of the following
objects: (1) the skyline objects and (2) the nonskyline objects that are -neighbors of a
skyline object, where r is an -neighbor of an object p if the distance between p and
r is no more than . Design a ranking cube to process generalized skyline queries
efficiently.
5.14 The ranking cube was designed to support top-k (ranking) queries in relational database
systems. However, ranking queries are also posed to data warehouses, where ranking is
on multidimensional aggregates instead of on measures of base facts. For example, con-
sider a product manager who is analyzing a sales database that stores the nationwide
sales history, organized by location and time. To make investment decisions, the man-
ager may pose the following query: “What are the top-10 (state, year) cells having the
largest total product sales?” He may further drill down and ask, “What are the top-10 (city,
month) cells?” Suppose the system can perform such partial materialization to derive two
types of materialized cuboids: a guiding cuboid and a supporting cuboid, where the for-
mer contains a number of guiding cells that provide concise, high-level data statistics
to guide the ranking query processing, whereas the latter provides inverted indices for
efficient online aggregation.
(a) Derive an efficient method for computing such aggregate ranking cubes.
(b) Extend your framework to handle more advanced measures. One such example
could be as follows. Consider an organization donation database, where donors
are grouped by “age,” “income,” and other attributes. Interesting questions include:
“Which age and income groups have made the top-k average amount of donation (per
donor)?” and “Which income group of donors has the largest standard deviation in the
donation amount?”
5.15 The prediction cube is a good example of multidimensional data mining in cube
space.
(a) Propose an efficient algorithm that computes prediction cubes in a given multidi-
mensional database.
(b) For what kind of classification models can your algorithm be applied? Explain.