Page 234 -
P. 234

2011/6/1
                               12-ch05-187-242-9780123814791
                                                                                    #11
                                                                     3:19 Page 197
                         HAN
                                                               5.2 Data Cube Computation Methods  197


                                 The base cuboid, denoted by ABC (from which all the other cuboids are directly or
                                 indirectly computed). This cube is already computed and corresponds to the given
                                 3-D array.
                                 The 2-D cuboids, AB, AC, and BC, which respectively correspond to the group-by’s
                                 AB, AC, and BC. These cuboids must be computed.
                                 The 1-D cuboids, A, B, and C, which respectively correspond to the group-by’s A, B,
                                 and C. These cuboids must be computed.

                                 The 0-D (apex) cuboid, denoted by all, which corresponds to the group-by (); that
                                 is, there is no group-by here. This cuboid must be computed. It consists of only one
                                 value. If, say, the data cube measure is count, then the value to be computed is simply
                                 the total count of all the tuples in ABC.

                                 Let’s look at how the multiway array aggregation technique is used in this computa-
                               tion. There are many possible orderings with which chunks can be read into memory
                               for use in cube computation. Consider the ordering labeled from 1 to 64, shown in
                               Figure 5.3. Suppose we want to compute the b 0 c 0 chunk of the BC cuboid. We allocate
                               space for this chunk in chunk memory. By scanning ABC chunks 1 through 4, the b 0 c 0
                               chunk is computed. That is, the cells for b 0 c 0 are aggregated over a 0 to a 3 . The chunk
                               memory can then be assigned to the next chunk, b 1 c 0 , which completes its aggregation
                               after the scanning of the next four ABC chunks: 5 through 8. Continuing in this way,
                               the entire BC cuboid can be computed. Therefore, only one BC chunk needs to be in
                               memory at a time, for the computation of all the BC chunks.
                                 In computing the BC cuboid, we will have scanned each of the 64 chunks. “Is there a
                               way to avoid having to rescan all of these chunks for the computation of other cuboids such
                               as AC and AB?” The answer is, most definitely, yes. This is where the “multiway com-
                               putation” or “simultaneous aggregation” idea comes in. For example, when chunk 1
                               (i.e., a 0 b 0 c 0 ) is being scanned (say, for the computation of the 2-D chunk b 0 c 0 of BC, as
                               described previously), all of the other 2-D chunks relating to a 0 b 0 c 0 can be simultane-
                               ously computed. That is, when a 0 b 0 c 0 is being scanned, each of the three chunks (b 0 c 0 ,
                               a 0 c 0 , and a 0 b 0 ) on the three 2-D aggregation planes (BC, AC, and AB) should be com-
                               puted then as well. In other words, multiway computation simultaneously aggregates to
                               each of the 2-D planes while a 3-D chunk is in memory.
                                 Now let’s look at how different orderings of chunk scanning and of cuboid compu-
                               tation can affect the overall data cube computation efficiency. Recall that the size of the
                               dimensions A, B, and C is 40, 400, and 4000, respectively. Therefore, the largest 2-D
                               plane is BC (of size 400 × 4000 = 1,600,000). The second largest 2-D plane is AC (of
                               size 40 × 4000 = 160,000). AB is the smallest 2-D plane (of size 40 × 400 = 16,000).
                                 Suppose that the chunks are scanned in the order shown, from chunks 1 to 64. As
                               previously mentioned, b 0 c 0 is fully aggregated after scanning the row containing chunks
                               1 through 4; b 1 c 0 is fully aggregated after scanning chunks 5 through 8, and so on. Thus,
                               we need to scan four chunks of the 3-D array to fully compute one chunk of the BC
                               cuboid (where BC is the largest of the 2-D planes). In other words, by scanning in this
   229   230   231   232   233   234   235   236   237   238   239