Page 298 -
P. 298

HAN
                                                            2011/6/1
                                                                     3:20 Page 261
                                                                                    #19
                               13-ch06-243-278-9780123814791
                                                               6.2 Frequent Itemset Mining Methods  261


                     Table 6.3 The Vertical Data Format of the Transaction Data
                               Set D of Table 6.1
                               itemset  TID set
                               I1     {T100, T400, T500, T700, T800, T900}
                               I2     {T100, T200, T300, T400, T600, T800, T900}
                               I3     {T300, T500, T600, T700, T800, T900}
                               I4     {T200, T400}
                               I5     {T100, T800}


                     Table 6.4 2-Itemsets in Vertical Data Format
                               itemset    TID set
                               {I1, I2}   {T100, T400, T800, T900}
                               {I1, I3}   {T500, T700, T800, T900}
                               {I1, I4}   {T400}
                               {I1, I5}   {T100, T800}
                               {I2, I3}   {T300, T600, T800, T900}
                               {I2, I4}   {T200, T400}
                               {I2, I5}   {T100, T800}
                               {I3, I5}   {T800}



                     Table 6.5 3-Itemsets in Vertical Data Format
                               itemset           TID set
                               {I1, I2, I3}      {T800, T900}
                               {I1, I2, I5}      {T100, T800}


                               frequent in Table 6.3, there are 10 intersections performed in total, which lead to eight
                               nonempty 2-itemsets, as shown in Table 6.4. Notice that because the itemsets {I1, I4}
                               and {I3, I5} each contain only one transaction, they do not belong to the set of frequent
                               2-itemsets.
                                 Based on the Apriori property, a given 3-itemset is a candidate 3-itemset only if every
                               one of its 2-itemset subsets is frequent. The candidate generation process here will gen-
                               erate only two 3-itemsets: {I1, I2, I3} and {I1, I2, I5}. By intersecting the TID sets of any
                               two corresponding 2-itemsets of these candidate 3-itemsets, it derives Table 6.5, where
                               there are only two frequent 3-itemsets: {I1, I2, I3: 2} and {I1, I2, I5: 2}.

                                 Example 6.6 illustrates the process of mining frequent itemsets by exploring the
                               vertical data format. First, we transform the horizontally formatted data into the
                               vertical format by scanning the data set once. The support count of an itemset is simply
                               the length of the TID set of the itemset. Starting with k = 1, the frequent k-itemsets
                               can be used to construct the candidate (k + 1)-itemsets based on the Apriori property.
   293   294   295   296   297   298   299   300   301   302   303