Page 163 -
P. 163
HAN 11-ch04-125-186-9780123814791
2011/6/1
126 Chapter 4 Data Warehousing and Online Analytical Processing 3:17 Page 126 #2
systems and data warehouses (Section 4.1.2), then explain the need for using data ware-
houses for data analysis, rather than performing the analysis directly on traditional
databases (Section 4.1.3). This is followed by a presentation of data warehouse architec-
ture (Section 4.1.4). Next, we study three data warehouse models—an enterprise model,
a data mart, and a virtual warehouse (Section 4.1.5). Section 4.1.6 describes back-end
utilities for data warehousing, such as extraction, transformation, and loading. Finally,
Section 4.1.7 presents the metadata repository, which stores data about data.
4.1.1 What Is a Data Warehouse?
Data warehousing provides architectures and tools for business executives to system-
atically organize, understand, and use their data to make strategic decisions. Data
warehouse systems are valuable tools in today’s competitive, fast-evolving world. In the
last several years, many firms have spent millions of dollars in building enterprise-wide
data warehouses. Many people feel that with competition mounting in every industry,
data warehousing is the latest must-have marketing weapon—a way to retain customers
by learning more about their needs.
“Then, what exactly is a data warehouse?” Data warehouses have been defined in many
ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data
warehouse refers to a data repository that is maintained separately from an organiza-
tion’s operational databases. Data warehouse systems allow for integration of a variety of
application systems. They support information processing by providing a solid platform
of consolidated historic data for analysis.
According to William H. Inmon, a leading architect in the construction of data
warehouse systems, “A data warehouse is a subject-oriented, integrated, time-variant,
and nonvolatile collection of data in support of management’s decision making pro-
cess” [Inm96]. This short but comprehensive definition presents the major features of
a data warehouse. The four keywords—subject-oriented, integrated, time-variant, and
nonvolatile—distinguish data warehouses from other data repository systems, such as
relational database systems, transaction processing systems, and file systems.
Let’s take a closer look at each of these key features.
Subject-oriented: A data warehouse is organized around major subjects such as cus-
tomer, supplier, product, and sales. Rather than concentrating on the day-to-day
operations and transaction processing of an organization, a data warehouse focuses
on the modeling and analysis of data for decision makers. Hence, data warehouses
typically provide a simple and concise view of particular subject issues by excluding
data that are not useful in the decision support process.
Integrated: A data warehouse is usually constructed by integrating multiple hetero-
geneous sources, such as relational databases, flat files, and online transaction
records. Data cleaning and data integration techniques are applied to ensure con-
sistency in naming conventions, encoding structures, attribute measures, and so on.