Page 312 -
P. 312
13-ch06-243-278-9780123814791
#33
3:20 Page 275
2011/6/1
HAN
6.5 Exercises 275
6.9 Suppose that a large store has a transactional database that is distributed among
four locations. Transactions in each component database have the same for-
mat, namely T j : {i 1 ,...,i m }, where T j is a transaction identifier, and i k (1 ≤
k ≤ m) is the identifier of an item purchased in the transaction. Propose an
efficient algorithm to mine global association rules. You may present your algo-
rithm in the form of an outline. Your algorithm should not require shipping
all the data to one site and should not cause excessive network communication
overhead.
6.10 Suppose that frequent itemsets are saved for a large transactional database, DB.
Discuss how to efficiently mine the (global) association rules under the same
minimum support threshold, if a set of new transactions, denoted as 1DB, is
(incrementally) added in?
6.11 Most frequent pattern mining algorithms consider only distinct items in a transac-
tion. However, multiple occurrences of an item in the same shopping basket, such
as four cakes and three jugs of milk, can be important in transactional data analysis.
How can one mine frequent itemsets efficiently considering multiple occurrences
of items? Propose modifications to the well-known algorithms, such as Apriori and
FP-growth, to adapt to such a situation.
6.12 (Implementation project) Many techniques have been proposed to further
improve the performance of frequent itemset mining algorithms. Taking FP-tree–
based frequent pattern growth algorithms (e.g., FP-growth) as an example, imple-
ment one of the following optimization techniques. Compare the performance of
your new implementation with the unoptimized version.
(a) The frequent pattern mining method of Section 6.2.4 uses an FP-tree to gen-
erate conditional pattern bases using a bottom-up projection technique (i.e.,
project onto the prefix path of an item p). However, one can develop a top-
down projection technique, that is, project onto the suffix path of an item p in
the generation of a conditional pattern base. Design and implement such a top-
down FP-tree mining method. Compare its performance with the bottom-up
projection method.
(b) Nodes and pointers are used uniformly in an FP-tree in the FP-growth algo-
rithm design. However, such a structure may consume a lot of space when
the data are sparse. One possible alternative design is to explore array- and
pointer-based hybrid implementation, where a node may store multiple items
when it contains no splitting point to multiple sub-branches. Develop such an
implementation and compare it with the original one.
(c) It is time and space consuming to generate numerous conditional pattern bases
during pattern-growth mining. An interesting alternative is to push right the
branches that have been mined for a particular item p, that is, to push them to
the remaining branch(es) of the FP-tree. This is done so that fewer conditional
pattern bases have to be generated and additional sharing can be explored when
mining the remaining FP-tree branches. Design and implement such a method
and conduct a performance study on it.