Page 192 -
P. 192

#31
                                                                     3:17 Page 155
                                                            2011/6/1
                               11-ch04-125-186-9780123814791
                         HAN
                                                               4.3 Data Warehouse Design and Usage  155


                         4.3.4 From Online Analytical Processing
                               to Multidimensional Data Mining
                               The data mining field has conducted substantial research regarding mining on vari-
                               ous data types, including relational data, data from data warehouses, transaction data,
                               time-series data, spatial data, text data, and flat files. Multidimensional data mining
                               (also known as exploratory multidimensional data mining, online analytical mining,
                               or OLAM) integrates OLAP with data mining to uncover knowledge in multidimen-
                               sional databases. Among the many different paradigms and architectures of data mining
                               systems, multidimensional data mining is particularly important for the following
                               reasons:

                                 High quality of data in data warehouses: Most data mining tools need to work on
                                 integrated, consistent, and cleaned data, which requires costly data cleaning, data
                                 integration, and data transformation as preprocessing steps. A data warehouse con-
                                 structed by such preprocessing serves as a valuable source of high-quality data for
                                 OLAP as well as for data mining. Notice that data mining may serve as a valuable
                                 tool for data cleaning and data integration as well.
                                 Available information processing infrastructure surrounding data warehouses:
                                 Comprehensive information processing and data analysis infrastructures have been
                                 or will be systematically constructed surrounding data warehouses, which include
                                 accessing, integration, consolidation, and transformation of multiple heterogeneous
                                 databases, ODBC/OLEDB connections, Web accessing and service facilities, and
                                 reporting and OLAP analysis tools. It is prudent to make the best use of the available
                                 infrastructures rather than constructing everything from scratch.
                                 OLAP-based exploration of multidimensional data: Effective data mining needs
                                 exploratory data analysis. A user will often want to traverse through a database, select
                                 portions of relevant data, analyze them at different granularities, and present knowl-
                                 edge/results in different forms. Multidimensional data mining provides facilities for
                                 mining on different subsets of data and at varying levels of abstraction—by drilling,
                                 pivoting, filtering, dicing, and slicing on a data cube and/or intermediate data min-
                                 ing results. This, together with data/knowledge visualization tools, greatly enhances
                                 the power and flexibility of data mining.
                                 Online selection of data mining functions: Users may not always know the specific
                                 kinds of knowledge they want to mine. By integrating OLAP with various data min-
                                 ing functions, multidimensional data mining provides users with the flexibility to
                                 select desired data mining functions and swap data mining tasks dynamically.

                                 Chapter 5 describes data warehouses on a finer level by exploring implementation
                               issues such as data cube computation, OLAP query answering strategies, and multi-
                               dimensional data mining. The chapters following it are devoted to the study of data
                               mining techniques. As we have seen, the introduction to data warehousing and OLAP
                               technology presented in this chapter is essential to our study of data mining. This
                               is because data warehousing provides users with large amounts of clean, organized,
   187   188   189   190   191   192   193   194   195   196   197