Page 200 -
P. 200

11-ch04-125-186-9780123814791
                                                                                    #39
                                                                     3:17 Page 163
                                                            2011/6/1
                         HAN
                                                                4.4 Data Warehouse Implementation  163


                                 Suppose that there are 360 time values, 100 items, 50 branches, 30 locations, and 10
                               million sales tuples in the sales star data cube. If the sales fact table has recorded sales
                               for only 30 items, the remaining 70 items will obviously not participate in joins. If join
                               indices are not used, additional I/Os have to be performed to bring the joining portions
                               of the fact table and the dimension tables together.

                                 To further speed up query processing, the join indexing and the bitmap indexing
                               methods can be integrated to form bitmapped join indices.



                         4.4.3 Efficient Processing of OLAP Queries
                               The purpose of materializing cuboids and constructing OLAP index structures is to
                               speed up query processing in data cubes. Given materialized views, query processing
                               should proceed as follows:

                               1. Determine which operations should be performed on the available cuboids: This
                                 involves transforming any selection, projection, roll-up (group-by), and drill-down
                                 operations specified in the query into corresponding SQL and/or OLAP operations.
                                 For example, slicing and dicing a data cube may correspond to selection and/or
                                 projection operations on a materialized cuboid.
                               2. Determine to which materialized cuboid(s) the relevant operations should be
                                 applied: This involves identifying all of the materialized cuboids that may poten-
                                 tially be used to answer the query, pruning the set using knowledge of “domi-
                                 nance” relationships among the cuboids, estimating the costs of using the remaining
                                 materialized cuboids, and selecting the cuboid with the least cost.

                  Example 4.9 OLAP query processing. Suppose that we define a data cube for AllElectronics of the
                               form “sales cube [time, item, location]: sum(sales in dollars).” The dimension hierarchies
                               used are “day < month < quarter < year” for time; “item name < brand < type” for
                               item; and “street < city < province or state < country” for location.
                                 Suppose that the query to be processed is on {brand, province or state}, with the
                               selection constant “year = 2010.” Also, suppose that there are four materialized cuboids
                               available, as follows:

                                 cuboid 1: {year, item name, city}
                                 cuboid 2: {year, brand, country}
                                 cuboid 3: {year, brand, province or state}
                                 cuboid 4: {item name, province or state}, where year = 2010

                                 “Which of these four cuboids should be selected to process the query?” Finer-granularity
                               data cannot be generated from coarser-granularity data. Therefore, cuboid 2 cannot be
                               used because country is a more general concept than province or state. Cuboids 1, 3, and
                               4 can be used to process the query because (1) they have the same set or a superset of the
   195   196   197   198   199   200   201   202   203   204   205