Page 124 -
P. 124
Chapter 3 • Data Warehousing 123
the large amounts of data contained in a typical data warehouse. A recent survey on
parallel and distributed data warehouses can be found in Furtado (2009). Teradata
(teradata.com) has successfully adopted and often commended on its novel imple-
mentation of this approach.
• Will data migration tools be used to load the data warehouse? Moving
data from an existing system into a data warehouse is a tedious and laborious task.
Depending on the diversity and the location of the data assets, migration may be
a relatively simple procedure or (in contrast) a months-long project. The results
of a thorough assessment of the existing data assets should be used to determine
whether to use migration tools and, if so, what capabilities to seek in those com-
mercial tools.
• What tools will be used to support data retrieval and analysis? Often it
is necessary to use specialized tools to periodically locate, access, analyze, extract,
transform, and load necessary data into a data warehouse. A decision has to be
made on (1) developing the migration tools in-house, (2) purchasing them from a
third-party provider, or (3) using the ones provided with the data warehouse system.
Overly complex, real-time migrations warrant specialized third-part ETL tools.
alternative Data Warehousing architectures
At the highest level, data warehouse architecture design viewpoints can be categorized
into enterprise-wide data warehouse (EDW) design and data mart (DM) design (Golfarelli
and Rizzi, 2009). In Figure 3.7 (parts a–e), we show some alternatives to the basic archi-
tectural design types that are neither pure EDW nor pure DM, but in between or beyond
the traditional architectural structures. Notable new ones include hub-and-spoke and
federated architectures. The five architectures shown in Figure 3.7 (parts a–e) are pro-
posed by Ariyachandra and Watson (2005, 2006a, and 2006b). Previously, in an extensive
study, Sen and Sinha (2005) identified 15 different data warehousing methodologies. The
sources of these methodologies are classified into three broad categories: core-technology
vendors, infrastructure vendors, and information-modeling companies.
a. Independent data marts. This is arguably the simplest and the least costly archi-
tecture alternative. The data marts are developed to operate independently of each
another to serve the needs of individual organizational units. Because of their inde-
pendence, they may have inconsistent data definitions and different dimensions and
measures, making it difficult to analyze data across the data marts (i.e., it is difficult,
if not impossible, to get to the “one version of the truth”).
b. Data mart bus architecture. This architecture is a viable alternative to the inde-
pendent data marts where the individual marts are linked to each other via some
kind of middleware. Because the data are linked among the individual marts, there
is a better chance of maintaining data consistency across the enterprise (at least at
the metadata level). Even though it allows for complex data queries across data
marts, the performance of these types of analysis may not be at a satisfactory level.
c. Hub-and-spoke architecture. This is perhaps the most famous data warehous-
ing architecture today. Here the attention is focused on building a scalable and
maintainable infrastructure (often developed in an iterative way, subject area by
subject area) that includes a centralized data warehouse and several dependent data
marts (each for an organizational unit). This architecture allows for easy customiza-
tion of user interfaces and reports. On the negative side, this architecture lacks the
holistic enterprise view, and may lead to data redundancy and data latency.
d. Centralized data warehouse. The centralized data warehouse architecture is
similar to the hub-and-spoke architecture except that there are no dependent data
marts; instead, there is a gigantic enterprise data warehouse that serves the needs
M03_SHAR9209_10_PIE_C03.indd 123 1/25/14 7:35 AM

