Page 290 -
P. 290
Knowledge Management Tools 273
Box 8.1
A vignette: Beer with your diapers
A chain of convenience stores conducted a market basket analysis to help in product
placement. Market basket analysis is a statistical analysis of items that consumers tend to
buy together (i.e., that are found in the same basket at checkout). One of their hypotheses
was to place all infant care-related items together and run a simple correlation check to
validate that mothers of newborns did in fact tend to buy items such as baby powder or
cream when they came in to purchase diapers. To their surprise, the highest correlation
for an item that tended to be bought at the same time as diapers (in the newborn size and
format) was in fact a case of beer. This was later explained by the observation that it was
the fathers of newborns who were more likely to be sent to the store to buy more diapers
and while they were there, they tended to pick up other equally essential items.
Variables may be correlated but this relationship may not have any meaning or
usefulness. For example, a major bank found that there was a relationship between
the state an applicant lived in and a higher percentage of defaults on loans given out.
This should not be the basis for a policy that would automatically reject any applicants
from that state! Reality checks are always needed with statistics before any conclusions
can be drawn, as noted by British statesman Benjamin Disraeli, “ There are three kinds
of lies: lies, damned lies and statistics. ”
Typical applications of data mining and knowledge discovery systems include
market segmentation, customer profi ling, fraud detection, retail promotion evalua-
tion, credit risk analysis, and market basket analyses (as described in the vignette).
However, there are a few gems usually to be mined with data mining applications.
These are often unexpected correlations that upon further study yield some useful
(and often actionable) insights into what is occurring. The famous example is that of
the relationship between purchases of beer and purchases of diapers.
Some data mining tools that are currently in use include:
• Statistical analysis tools (e.g., SAS)
• Data mining suites (e.g., EnterpriseMiner)
• Consulting/outsourcing tools such as EDS, IBM, Epsilon (note that these tools are
models, not just software)
• Data visualization software that coherently present a large amount of information
in a small space. They make use of the human computer — your eyes — to detect
patterns, for example, virtual reality and simulation software — to walk around the data
points.