Page 252 -
P. 252

HAN
                                                            2011/6/1
                                                                     3:19 Page 215
                               12-ch05-187-242-9780123814791
                                                                                    #29
                                                               5.2 Data Cube Computation Methods  215


                               array instead, which stores what we need to compute other measures. For example,
                               to compute average(), we let the ID measure array hold three elements, namely, (TID,
                               item count, sum), for each cell (line 3 of the shell fragment computation algorithm in
                               Figure 5.14). The average() measure for each aggregate cell can then be computed by
                               accessing only this ID measure array, using sum()/item count(). Considering a database
                                     6
                               with 10 tuples, each taking 4 bytes each for TID, item count, and sum, the ID measure
                               array requires 12 MB, whereas the corresponding database of 60 dimensions will require
                                             6
                               (60 + 3) × 4 × 10 = 252 MB (assuming each attribute value takes 4 bytes). Obviously,
                               ID measure array is a more compact data structure and is more likely to fit in memory
                               than the corresponding high-dimensional database.
                                 To illustrate the design of the ID measure array, let’s look at Example 5.11.

                 Example 5.11 Computing cubes with the average() measure. Table 5.8 shows an example sales
                               database where each tuple has two associated values, such as item count and sum, where
                               item count is the count of items sold.
                                 To compute a data cube for this database with the measure average(), we need to have
                               a TID list for each cell: {TID 1 ,...,TID n }. Because each TID is uniquely associated with a
                               particular set of measure values, all future computation just needs to fetch the measure
                               values associated with the tuples in the list. In other words, by keeping an ID measure
                               array in memory for online processing, we can handle complex algebraic measures, such
                               as average, variance, and standard deviation. Table 5.9 shows what exactly should be kept
                               for our example, which is substantially smaller than the database itself.


                     Table 5.8 Database with Two Measure Values

                               TID   A   B    C    D    E   item count  sum
                               1     a 1  b 1  c 1  d 1  e 1  5        70
                               2     a 1  b 2  c 1  d 2  e 1  3        10
                               3     a 1  b 2  c 1  d 1  e 2  8        20
                               4     a 2  b 1  c 1  d 1  e 2  5        40
                               5     a 2  b 1  c 1  d 1  e 3  2        30



                     Table 5.9 Table 5.8 ID measure Array

                               TID    item count  sum
                               1      5           70
                               2      3           10
                               3      8           20
                               4      5           40
                               5      2           30
   247   248   249   250   251   252   253   254   255   256   257