Page 274 -
P. 274

3:19 Page 237
                                                            2011/6/1
                               12-ch05-187-242-9780123814791
                         HAN
                                                                                    #51
                                                                                    5.6 Exercises  237


                              (b) Modify your design in (a) to handle incremental data updates. Give the reasoning
                                  behind your new design.
                           5.6 When computing a cube of high dimensionality, we encounter the inherent curse of
                               dimensionality problem: There exists a huge number of subsets of combinations of
                               dimensions.
                               (a) Suppose that there are only two base cells, {(a 1 , a 2 , a 3 ,..., a 100 ) and (a 1 , a 2 ,
                                  b 3 ,..., b 100 )}, in a 100-D base cuboid. Compute the number of nonempty aggregate
                                  cells. Comment on the storage space and time required to compute these cells.
                              (b) Suppose we are to compute an iceberg cube from (a). If the minimum support count
                                  in the iceberg condition is 2, how many aggregate cells will there be in the iceberg
                                  cube? Show the cells.
                               (c) Introducing iceberg cubes will lessen the burden of computing trivial aggregate cells
                                  in a data cube. However, even with iceberg cubes, we could still end up having to
                                  compute a large number of trivial uninteresting cells (i.e., with small counts). Sup-
                                  pose that a database has 20 tuples that map to (or cover) the two following base
                                  cells in a 100-D base cuboid, each with a cell count of 10: {(a 1 , a 2 , a 3 ,..., a 100 ) : 10,
                                  (a 1 , a 2 , b 3 ,..., b 100 ) : 10}.
                                  i. Let the minimum support be 10. How many distinct aggregate cells will
                                     there be like the following: {(a 1 , a 2 , a 3 , a 4 ,..., a 99 , ∗) : 10,...,(a 1 , a 2 , ∗ , a 4 ,...,
                                     a 99 , a 100 ) : 10, ..., (a 1 , a 2 , a 3 , ∗ ,..., ∗ , ∗) : 10}?
                                  ii. If we ignore all the aggregate cells that can be obtained by replacing some con-
                                     stants with ∗’s while keeping the same measure value, how many distinct cells
                                     remain? What are the cells?

                           5.7 Propose an algorithm that computes closed iceberg cubes efficiently.
                           5.8 Suppose that we want to compute an iceberg cube for the dimensions, A, B, C, D, where
                               we wish to materialize all cells that satisfy a minimum support count of at least v, and
                               where cardinality(A) < cardinality(B) < cardinality(C) < cardinality(D). Show the BUC
                               processing tree (which shows the order in which the BUC algorithm explores a data
                               cube’s lattice, starting from all) for the construction of this iceberg cube.
                           5.9 Discuss how you might extend the Star-Cubing algorithm to compute iceberg cubes
                               where the iceberg condition tests for an avg that is no bigger than some value, v.
                          5.10 A flight data warehouse for a travel agent consists of six dimensions: traveler, departure
                               (city), departure time, arrival, arrival time, and flight; and two measures: count( ) and
                               avg fare( ), where avg fare( ) stores the concrete fare at the lowest level but the average fare
                               at other levels.
                               (a) Suppose the cube is fully materialized. Starting with the base cuboid [traveler, depar-
                                  ture, departure time, arrival, arrival time, flight], what specific OLAP operations
                                  (e.g., roll-up flight to airline) should one perform to list the average fare per month
                                  for each business traveler who flies American Airlines (AA) from Los Angeles in 2009?
   269   270   271   272   273   274   275   276   277   278   279