Page 394 -
P. 394

393
                                                                   Q9-5  How Do Organizations Use Data Mining Applications?

                                               of dive computers, only 20 appeared with fins. So the likelihood of someone buying fins, given he
                                               or she bought a dive computer, is 20/120, or .1666. Thus, when someone buys a dive computer,
                                               the likelihood that he or she will also buy fins falls from .625 to .1666.
                                                   The ratio of confidence to the base probability of buying an item is called lift. Lift shows how
                                               much the base probability increases or decreases when other products are purchased. The lift of
                                               fins and a mask is the confidence of fins given a mask, divided by the base probability of fins. In
                                               Figure 9-21, the lift of fins and a mask is .926/.7, or 1.32. Thus, the likelihood that people buy fins
                                               when they buy a mask increases by 32 percent. Surprisingly, it turns out that the lift of fins and a
                                               mask is the same as the lift of a mask and fins. Both are 1.32.
                                                   We need to be careful here, though, because this analysis shows only shopping carts with
                                               two items. We cannot say from this data what the likelihood is that customers, given that they
                                               bought a mask, will buy both weights and fins. To assess that probability, we need to analyze
                                               shopping carts with three items. This statement illustrates, once again, that we need to know
                                               what problem we’re solving before we start to build the information system to mine the data. The
                                               problem definition will help us decide if we need to analyze three-item, four-item, or some other
                                               sized shopping cart.
                                                   Many organizations are benefiting from market-basket analysis today. You can expect that
                                               this technique will become a standard CRM analysis during your career.


                                               Decision Trees

                                               A decision tree is a hierarchical arrangement of criteria that predict a classification or a value.
                                               Here we will consider decision  trees  that  predict classifications. Decision  tree analyses are an
                                               unsupervised data mining technique: The analyst sets up the computer program and provides the
                                               data to analyze, and the decision tree program produces the tree.
                                                   A common business application of decision trees is to classify loans by likelihood of default.
                                               Organizations analyze data from past loans to produce a decision tree that can be converted to
                                               loan-decision rules. A financial institution could use such a tree to assess the default risk on a
                                               new loan. Sometimes, too, financial institutions sell a group of loans (called a loan portfolio) to one
                                               another. An institution considering the purchase of a loan portfolio can use the results of a deci-
                                               sion tree program to evaluate the risk of a given portfolio.
                                                   Figure  9-22 shows an example  provided  by Insightful Corporation, a vendor of BI  tools.
                                               This  example was generated  using its  Insightful Miner product. This tool examined data from
                                               3,485 loans. Of those loans, 72 percent had no default and 28 percent did default. To perform the
                                                 analysis, the decision tree tool examined six different loan characteristics.
                                                   In this example, the decision tree program determined that the percentage of the loan that
                                               is past due (PercPastDue) is the best first criterion. Reading Figure 9-22, you can see that of the
                                               2,574 loans with a PercPastDue value of 0.5 or less (amount past due is less than half the loan
                                               amount), 94 percent were not in default. Reading down several lines in this tree, 911 loans had a
                                               value of PercPastDue greater than 0.5; of those loans, 89 percent were in default.
                                                   These two major categories are then further subdivided into three classifications: CreditScore
                                               is a creditworthiness score obtained from a credit agency;  MonthsPastDue is  the number of
                                               months since a payment; and CurrentLTV is the current ratio of outstanding balance of the loan
                                               to the value of the loan’s collateral.
                                                   With a decision tree like this, the financial institution can develop decision rules for accepting
                                               or rejecting the offer to purchase loans from another financial institution. For example:

                                                   •  If percent past due is less than 50 percent, then accept the loan.
                                                   •  If percent past due is greater than 50 percent and
                                                   •  If CreditScore is greater than 572.6 and
                                                   •  If CurrentLTV is less than .94, then accept the loan.
                                                   •  Otherwise, reject the loan.
   389   390   391   392   393   394   395   396   397   398   399