Page 252 -
P. 252
HAN
2011/6/1
3:19 Page 215
12-ch05-187-242-9780123814791
#29
5.2 Data Cube Computation Methods 215
array instead, which stores what we need to compute other measures. For example,
to compute average(), we let the ID measure array hold three elements, namely, (TID,
item count, sum), for each cell (line 3 of the shell fragment computation algorithm in
Figure 5.14). The average() measure for each aggregate cell can then be computed by
accessing only this ID measure array, using sum()/item count(). Considering a database
6
with 10 tuples, each taking 4 bytes each for TID, item count, and sum, the ID measure
array requires 12 MB, whereas the corresponding database of 60 dimensions will require
6
(60 + 3) × 4 × 10 = 252 MB (assuming each attribute value takes 4 bytes). Obviously,
ID measure array is a more compact data structure and is more likely to fit in memory
than the corresponding high-dimensional database.
To illustrate the design of the ID measure array, let’s look at Example 5.11.
Example 5.11 Computing cubes with the average() measure. Table 5.8 shows an example sales
database where each tuple has two associated values, such as item count and sum, where
item count is the count of items sold.
To compute a data cube for this database with the measure average(), we need to have
a TID list for each cell: {TID 1 ,...,TID n }. Because each TID is uniquely associated with a
particular set of measure values, all future computation just needs to fetch the measure
values associated with the tuples in the list. In other words, by keeping an ID measure
array in memory for online processing, we can handle complex algebraic measures, such
as average, variance, and standard deviation. Table 5.9 shows what exactly should be kept
for our example, which is substantially smaller than the database itself.
Table 5.8 Database with Two Measure Values
TID A B C D E item count sum
1 a 1 b 1 c 1 d 1 e 1 5 70
2 a 1 b 2 c 1 d 2 e 1 3 10
3 a 1 b 2 c 1 d 1 e 2 8 20
4 a 2 b 1 c 1 d 1 e 2 5 40
5 a 2 b 1 c 1 d 1 e 3 2 30
Table 5.9 Table 5.8 ID measure Array
TID item count sum
1 5 70
2 3 10
3 8 20
4 5 40
5 2 30