Page 224 -
P. 224

3:19
                                                                           Page 187
                                                                                    #1
                                                             2011/6/1
                          HAN 12-ch05-187-242-9780123814791






                                                                                5



                                                 Data Cube Technology










                     Data warehouse systems provide online analytical processing (OLAP) tools for interactive
                               analysis of multidimensional data at varied granularity levels. OLAP tools typically use
                               the data cube and a multidimensional data model to provide flexible access to summa-
                               rized data. For example, a data cube can store precomputed measures (like count() and
                               total sales()) for multiple combinations of data dimensions (like item, region, and customer).
                               Users can pose OLAP queries on the data. They can also interactively explore the data
                               in a multidimensional way through OLAP operations like drill-down (to see more spe-
                               cialized data such as total sales per city) or roll-up (to see the data at a more generalized
                               level such as total sales per country).
                                 Although the data cube concept was originally intended for OLAP, it is also use-
                               ful for data mining. Multidimensional data mining is an approach to data mining
                               that integrates OLAP-based data analysis with knowledge discovery techniques. It is
                               also known as exploratory multidimensional data mining and online analytical mining
                               (OLAM). It searches for interesting patterns by exploring the data in multidimensional
                               space. This gives users the freedom to dynamically focus on any subset of interesting
                               dimensions. Users can interactively drill down or roll up to varying abstraction levels to
                               find classification models, clusters, predictive rules, and outliers.
                                 This chapter focuses on data cube technology. In particular, we study methods for
                               data cube computation and methods for multidimensional data analysis. Precomput-
                               ing a data cube (or parts of a data cube) allows for fast accessing of summarized data.
                               Given the high dimensionality of most data, multidimensional analysis can run into
                               performance bottlenecks. Therefore, it is important to study data cube computation
                               techniques. Luckily, data cube technology provides many effective and scalable meth-
                               ods for cube computation. Studying these methods will also help in our understanding
                               and further development of scalable methods for other data mining tasks such as the
                               discovery of frequent patterns (Chapters 6 and 7).
                                 We begin in Section 5.1 with preliminary concepts for cube computation. These sum-
                               marize the data cube notion as a lattice of cuboids, and describe basic forms of cube
                               materialization. General strategies for cube computation are given. Section 5.2 follows
                               with an in-depth look at specific methods for data cube computation. We study both
                               full materialization (i.e., where all the cuboids representing a data cube are precomputed

                               Data Mining: Concepts and Techniques                              187
                               c 
 2012 Elsevier Inc. All rights reserved.
   219   220   221   222   223   224   225   226   227   228   229