Page 384 -
P. 384
383
Q9-3 How Do Organizations Use Data Warehouses and Data Marts to Acquire Data?
Consider the first problem: too many attributes. Suppose we want to know the factors that
influence how customers respond to a promotion. If we combine internal customer data with
purchased customer data, we will have more than a hundred different attributes to consider. How
do we select among them? In some cases, analysts can ignore the columns they don’t need. But
in more sophisticated data mining analyses, too many attributes can be problematic. Because of
a phenomenon called the curse of dimensionality, the more attributes there are, the easier it is to
build a model that fits the sample data but that is worthless as a predictor. There are other good
reasons for reducing the number of attributes, and one of the major activities in data mining
concerns efficient and effective ways of selecting attributes.
The second way to have an excess of data is to have too many data points—too many rows
of data. Suppose we want to analyze clickstream data on CNN.com. How many clicks does that
site receive per month? Millions upon millions! In order to meaningfully analyze such data we
need to reduce the amount of data. One good solution to this problem is statistical sampling.
Organizations should not be reluctant to sample data in such situations.
Data Warehouses Versus Data Marts
To understand the difference between data warehouses and data marts, think of a data warehouse
as a distributor in a supply chain. The data warehouse takes data from the data manufacturers
(operational systems and other sources), cleans and processes the data, and locates the data on
the shelves, so to speak, of the data warehouse. The data analysts who work with a data warehouse
are experts at data management, data cleaning, data transformation, data relationships, and the
like. However, they are not usually experts in a given business function.
A data mart is a data collection, smaller than the data warehouse, that addresses the needs
of a particular department or functional area of the business. If the data warehouse is the dis-
tributor in a supply chain, then a data mart is like a retail store in a supply chain. Users in the data
mart obtain data that pertain to a particular business function from the data warehouse. Such
users do not have the data management expertise that data warehouse employees have, but they
are knowledgeable analysts for a given business function.
Figure 9-15 illustrates these relationships. In this example, the data warehouse takes data
from the data producers and distributes the data to three data marts. One data mart is used to
analyze clickstream data for the purpose of designing Web pages. A second analyzes store sales
data and determines which products tend to be purchased together. This information is used to
Data Data Web BI tools Web page
Warehouse Warehouse Log for Web clickstream design features
Metadata Database Data analysis
Web Sales Data Mart
Store Sales Data Mart
Data Producers Warehouse Store management Market-basket
BI tools
Data
for store
analysis for sales
Sales
training
Data
DBMS
Inventory
Inventory BI tools layout
History for inventory for optimal
Data management item picking
Figure 9-15 Inventory Data Mart
Data Mart Examples

