Page 202 - Building Big Data Applications
P. 202

202   Building Big Data Applications


             the data, and most important of all the need to curb the islands of misfit toys from
             reoccurring. In this aspect we have evolved the data swamp and data lake layers as the
             responses to these challenges. The ability to store data in raw format, defer the modeling
             of it until time of analysis and the compelling economics of cloud storage and distrib-
             uted file systems has provided answers to manage the problem. The new infrastructure
             model has evolved quickly and created many offerings for different kinds of enterprises
             based on size, complexity, usage, and data (Fig. 11.5).
                As we evolved the model of computing in the cloud for the enterprise and have
             successfully adopted the data lake model, the data warehouse is still needed for
             corporate analytical computes and it has expanded with an addition of the data lake and
             data swamp layers in the upstream and analytical data hubs downstream. This means we
             need to manage the data journey from the swamp to the hub, and maintain all lineage,
             traceability, and transformation logistics, which need to be available on demand.
                The multiple layers need to coexist, and our tools and platforms need to be designed
             and architected to accommodate this heterogeneity. That coexistence is not well
             accommodated by the tools market, which is largely split along data warehouse-data
             lake lines. Older, established tools that predate Hadoop and data lakes were designed
             to work with relational database management systems. Newer tools that grew up in the
             big data era are more focused on managing individual data files kept in cloud storage
             systems like Amazon S3 or distributed file systems such as Hadoop’s HDFS. The foun-
             dational issue is how do we marry the two? (Fig. 11.6).
                Enterprises do not want a broken tool chain, they want technologies that can straddle the
             line and work with platforms on either side of it. They all have multiple sets of data tech-
             nologies, and the data in each must be leveraged together, to benefit the enterprise. This
             requires the different databases, data warehouses, data swamps, and other systems with all


























                                           FIGURE 11.5 Cloud computing.
   197   198   199   200   201   202   203   204   205   206   207