Page 200 -
P. 200
11-ch04-125-186-9780123814791
#39
3:17 Page 163
2011/6/1
HAN
4.4 Data Warehouse Implementation 163
Suppose that there are 360 time values, 100 items, 50 branches, 30 locations, and 10
million sales tuples in the sales star data cube. If the sales fact table has recorded sales
for only 30 items, the remaining 70 items will obviously not participate in joins. If join
indices are not used, additional I/Os have to be performed to bring the joining portions
of the fact table and the dimension tables together.
To further speed up query processing, the join indexing and the bitmap indexing
methods can be integrated to form bitmapped join indices.
4.4.3 Efficient Processing of OLAP Queries
The purpose of materializing cuboids and constructing OLAP index structures is to
speed up query processing in data cubes. Given materialized views, query processing
should proceed as follows:
1. Determine which operations should be performed on the available cuboids: This
involves transforming any selection, projection, roll-up (group-by), and drill-down
operations specified in the query into corresponding SQL and/or OLAP operations.
For example, slicing and dicing a data cube may correspond to selection and/or
projection operations on a materialized cuboid.
2. Determine to which materialized cuboid(s) the relevant operations should be
applied: This involves identifying all of the materialized cuboids that may poten-
tially be used to answer the query, pruning the set using knowledge of “domi-
nance” relationships among the cuboids, estimating the costs of using the remaining
materialized cuboids, and selecting the cuboid with the least cost.
Example 4.9 OLAP query processing. Suppose that we define a data cube for AllElectronics of the
form “sales cube [time, item, location]: sum(sales in dollars).” The dimension hierarchies
used are “day < month < quarter < year” for time; “item name < brand < type” for
item; and “street < city < province or state < country” for location.
Suppose that the query to be processed is on {brand, province or state}, with the
selection constant “year = 2010.” Also, suppose that there are four materialized cuboids
available, as follows:
cuboid 1: {year, item name, city}
cuboid 2: {year, brand, country}
cuboid 3: {year, brand, province or state}
cuboid 4: {item name, province or state}, where year = 2010
“Which of these four cuboids should be selected to process the query?” Finer-granularity
data cannot be generated from coarser-granularity data. Therefore, cuboid 2 cannot be
used because country is a more general concept than province or state. Cuboids 1, 3, and
4 can be used to process the query because (1) they have the same set or a superset of the