Page 148 -
P. 148

3:16 Page 111
                                                            2011/6/1
                         HAN
                               10-ch03-083-124-9780123814791
                                                                                    #29
                                                      3.5 Data Transformation and Data Discretization  111


                                                 D
                                          branch  C
                                            B
                                         A
                                      home
                                 entertainment  568
                                item_type  computer  750

                                      phone
                                            150
                                     security  50
                                           2008 2009  2010
                                                year

                    Figure 3.11 A data cube for sales at AllElectronics.



                               multidimensional aggregated information. For example, Figure 3.11 shows a data cube
                               for multidimensional analysis of sales data with respect to annual sales per item type
                               for each AllElectronics branch. Each cell holds an aggregate data value, corresponding
                               to the data point in multidimensional space. (For readability, only some cell values are
                               shown.) Concept hierarchies may exist for each attribute, allowing the analysis of data
                               at multiple abstraction levels. For example, a hierarchy for branch could allow branches
                               to be grouped into regions, based on their address. Data cubes provide fast access to
                               precomputed, summarized data, thereby benefiting online analytical processing as well
                               as data mining.
                                 The cube created at the lowest abstraction level is referred to as the base cuboid. The
                               base cuboid should correspond to an individual entity of interest such as sales or cus-
                               tomer. In other words, the lowest level should be usable, or useful for the analysis. A cube
                               at the highest level of abstraction is the apex cuboid. For the sales data in Figure 3.11,
                               the apex cuboid would give one total—the total sales for all three years, for all item
                               types, and for all branches. Data cubes created for varying levels of abstraction are often
                               referred to as cuboids, so that a data cube may instead refer to a lattice of cuboids. Each
                               higher abstraction level further reduces the resulting data size. When replying to data
                               mining requests, the smallest available cuboid relevant to the given task should be used.
                               This issue is also addressed in Chapter 4.


                       3.5     Data Transformation and Data Discretization


                               This section presents methods of data transformation. In this preprocessing step, the
                               data are transformed or consolidated so that the resulting mining process may be more
                               efficient, and the patterns found may be easier to understand. Data discretization, a form
                               of data transformation, is also discussed.
   143   144   145   146   147   148   149   150   151   152   153