Page 124 -
P. 124

Chapter 3  •  Data Warehousing  123

                        the large amounts of data contained in a typical data warehouse. A recent  survey on
                        parallel and distributed data warehouses can be found in Furtado (2009). Teradata
                        (teradata.com) has successfully adopted and often commended on its novel imple-
                        mentation of this approach.
                       • Will data migration tools be used to load the data warehouse?  Moving
                        data from an existing system into a data warehouse is a tedious and laborious task.
                        Depending on the diversity and the location of the data assets, migration may be
                        a relatively simple procedure or (in contrast) a months-long project. The results
                        of a thorough assessment of the existing data assets should be used to determine
                        whether to use migration tools and, if so, what capabilities to seek in those com-
                        mercial tools.
                       • What tools will be used to support data retrieval and analysis?  Often it
                        is necessary to use specialized tools to periodically locate, access, analyze, extract,
                        transform, and load necessary data into a data warehouse. A decision has to be
                        made on (1) developing the migration tools in-house, (2) purchasing them from a
                        third-party provider, or (3) using the ones provided with the data warehouse system.
                        Overly complex, real-time migrations warrant specialized third-part ETL tools.

                    alternative Data Warehousing architectures
                    At the highest level, data warehouse architecture design viewpoints can be categorized
                    into enterprise-wide data warehouse (EDW) design and data mart (DM) design (Golfarelli
                    and Rizzi, 2009). In Figure 3.7 (parts a–e), we show some alternatives to the basic archi-
                    tectural design types that are neither pure EDW nor pure DM, but in between or beyond
                    the traditional  architectural structures.  Notable new ones  include hub-and-spoke  and
                    federated architectures. The five architectures shown in Figure 3.7 (parts a–e) are pro-
                    posed by Ariyachandra and Watson (2005, 2006a, and 2006b). Previously, in an extensive
                    study, Sen and Sinha (2005) identified 15 different data warehousing methodologies. The
                    sources of these methodologies are classified into three broad categories: core-technology
                    vendors, infrastructure vendors, and information-modeling companies.
                     a.  Independent data marts.  This is arguably the simplest and the least costly archi-
                        tecture alternative. The data marts are developed to operate independently of each
                        another to serve the needs of individual organizational units. Because of their inde-
                        pendence, they may have inconsistent data definitions and different dimensions and
                        measures, making it difficult to analyze data across the data marts (i.e., it is  difficult,
                        if not impossible, to get to the “one version of the truth”).
                     b.  Data mart bus architecture.  This architecture is a viable alternative to the inde-
                        pendent data marts where the individual marts are linked to each other via some
                        kind of middleware. Because the data are linked among the individual marts, there
                        is a better chance of maintaining data consistency across the enterprise (at least at
                        the metadata level). Even though it allows for complex data queries across data
                        marts, the performance of these types of analysis may not be at a satisfactory level.
                     c.  Hub-and-spoke architecture.  This is perhaps the most famous data warehous-
                        ing architecture today. Here the attention is focused on building a scalable and
                        maintainable infrastructure (often developed in an iterative way, subject area by
                        subject area) that includes a centralized data warehouse and several dependent data
                        marts (each for an organizational unit). This architecture allows for easy customiza-
                        tion of user interfaces and reports. On the negative side, this architecture lacks the
                        holistic enterprise view, and may lead to data redundancy and data latency.
                     d.  Centralized data warehouse.  The centralized data warehouse architecture is
                        similar to the hub-and-spoke architecture except that there are no dependent data
                        marts; instead, there is a gigantic enterprise data warehouse that serves the needs








           M03_SHAR9209_10_PIE_C03.indd   123                                                                     1/25/14   7:35 AM
   119   120   121   122   123   124   125   126   127   128   129