Page 312 -
P. 312

13-ch06-243-278-9780123814791
                                                                                    #33
                                                                     3:20 Page 275
                                                            2011/6/1
                         HAN
                                                                                    6.5 Exercises  275


                               6.9 Suppose that a large store has a transactional database that is distributed among
                                   four locations. Transactions in each component database have the same for-
                                   mat, namely T j : {i 1 ,...,i m }, where T j is a transaction identifier, and i k (1 ≤
                                   k ≤ m) is the identifier of an item purchased in the transaction. Propose an
                                   efficient algorithm to mine global association rules. You may present your algo-
                                   rithm in the form of an outline. Your algorithm should not require shipping
                                   all the data to one site and should not cause excessive network communication
                                   overhead.

                              6.10 Suppose that frequent itemsets are saved for a large transactional database, DB.
                                   Discuss how to efficiently mine the (global) association rules under the same
                                   minimum support threshold, if a set of new transactions, denoted as 1DB, is
                                   (incrementally) added in?
                              6.11 Most frequent pattern mining algorithms consider only distinct items in a transac-
                                   tion. However, multiple occurrences of an item in the same shopping basket, such
                                   as four cakes and three jugs of milk, can be important in transactional data analysis.
                                   How can one mine frequent itemsets efficiently considering multiple occurrences
                                   of items? Propose modifications to the well-known algorithms, such as Apriori and
                                   FP-growth, to adapt to such a situation.
                              6.12 (Implementation project) Many techniques have been proposed to further
                                   improve the performance of frequent itemset mining algorithms. Taking FP-tree–
                                   based frequent pattern growth algorithms (e.g., FP-growth) as an example, imple-
                                   ment one of the following optimization techniques. Compare the performance of
                                   your new implementation with the unoptimized version.
                                   (a) The frequent pattern mining method of Section 6.2.4 uses an FP-tree to gen-
                                      erate conditional pattern bases using a bottom-up projection technique (i.e.,
                                      project onto the prefix path of an item p). However, one can develop a top-
                                      down projection technique, that is, project onto the suffix path of an item p in
                                      the generation of a conditional pattern base. Design and implement such a top-
                                      down FP-tree mining method. Compare its performance with the bottom-up
                                      projection method.
                                  (b) Nodes and pointers are used uniformly in an FP-tree in the FP-growth algo-
                                      rithm design. However, such a structure may consume a lot of space when
                                      the data are sparse. One possible alternative design is to explore array- and
                                      pointer-based hybrid implementation, where a node may store multiple items
                                      when it contains no splitting point to multiple sub-branches. Develop such an
                                      implementation and compare it with the original one.
                                   (c) It is time and space consuming to generate numerous conditional pattern bases
                                      during pattern-growth mining. An interesting alternative is to push right the
                                      branches that have been mined for a particular item p, that is, to push them to
                                      the remaining branch(es) of the FP-tree. This is done so that fewer conditional
                                      pattern bases have to be generated and additional sharing can be explored when
                                      mining the remaining FP-tree branches. Design and implement such a method
                                      and conduct a performance study on it.
   307   308   309   310   311   312   313   314   315   316   317