Page 277 -

P. 277

HAN
12-ch05-187-242-9780123814791

240 Chapter 5 Data Cube Technology 2011/6/1 3:19 Page 240 #54

5.16 Multifeature cubes allow us to construct interesting data cubes based on rather sophisti-
cated query conditions. Can you construct the following multifeature cube by trans-
lating the following user requests into queries using the form introduced in this
textbook?
(a) Construct a smart shopper cube where a shopper is smart if at least 10% of the goods
she buys in each shopping trip are on sale.
(b) Construct a data cube for best-deal products where best-deal products are those
products for which the price is the lowest for this product in the given month.

5.17 Discovery-driven cube exploration is a desirable way to mark interesting points among
a large number of cells in a data cube. Individual users may have different views on
whether a point should be considered interesting enough to be marked. Suppose one
would like to mark those objects of which the absolute value of z score is over 2 in every
row and column in a d-dimensional plane.
(a) Derive an efﬁcient computation method to identify such points during the data cube
computation.
(b) Suppose a partially materialized cube has (d − 1)-dimensional and (d + 1)-
dimensional cuboids materialized but not the d-dimensional one. Derive an efﬁcient
method to mark those (d − 1)-dimensional cells with d-dimensional children that
contain such marked points.

5.7 Bibliographic Notes

Efﬁcient computation of multidimensional aggregates in data cubes has been studied
+
by many researchers. Gray, Chaudhuri, Bosworth, et al. [GCB 97] proposed cube-by as
a relational aggregation operator generalizing group-by, crosstabs, and subtotals, and
categorized data cube measures into three categories: distributive, algebraic, and holis-
tic. Harinarayan, Rajaraman, and Ullman [HRU96] proposed a greedy algorithm for
the partial materialization of cuboids in the computation of a data cube. Sarawagi and
Stonebraker [SS94] developed a chunk-based computation technique for the efﬁcient
organization of large multidimensional arrays. Agarwal, Agrawal, Deshpande, et al.
+
[AAD 96] proposed several guidelines for efﬁcient computation of multidimensional
aggregates for ROLAP servers.
The chunk-based MultiWay array aggregation method for data cube computation in
MOLAP was proposed in Zhao, Deshpande, and Naughton [ZDN97]. Ross and Srivas-
tava [RS97] developed a method for computing sparse data cubes. Iceberg queries are
+
ﬁrst described in Fang, Shivakumar, Garcia-Molina, et al. [FSGM 98]. BUC, a scalable
method that computes iceberg cubes from the apex cuboid downwards, was introduced
by Beyer and Ramakrishnan [BR99]. Han, Pei, Dong, and Wang [HPDW01] introduced
an H-Cubing method for computing iceberg cubes with complex measures using an
H-tree structure.
The Star-Cubing method for computing iceberg cubes with a dynamic star-tree struc-
ture was introduced by Xin, Han, Li, and Wah [XHLW03]. MM-Cubing, an efﬁcient

272 273 274 275 276 277 278 279 280 281 282