Page 204 - Building Big Data Applications
P. 204

204   Building Big Data Applications


                Given the data volumes in today’s data lakes, both in terms of the numbers and the
             size of each, discovering such relationships on a manual basis is sufficiently difficult as
             to be thoroughly impractical. Automation is your savior, and algorithmic detection of
             such relationships and flows can be achieved through analysis of data values, distribu-
             tion, formulas, and so forth. But how do we get to this level of sophistication? Who can
             guide us in this journey?
                This is an issue for data management tools, too. When you are going to manage a
             comprehensive data catalog, then all the data across the enterprise must be in it. The
             data catalog must be well categorized, tagged, and managed with appropriate metadata,
             and eventually the exercise should help the enterprise arrest all the silos and integrate
             the data into one repository with all the interfaces and access mechanisms established.
                In establishing the catalogs and managing them, if data sets from the data lake are not
             properly cataloged, the lake will quickly become mismanaged and lead to even further
             frustration among users. This is especially the case because of the physical format of a
             data lake: a collection of files in a folder structure. Similarly, if data sets across the da-
             tabases are not cataloged in the exercise, they will still be hanging loose and create a
             mess when the data catalog executes. We need a tool to ensure that all this happens
             governed and will result in a compliant outcome (Fig. 11.7).
                The reason for searching a tool that can be “smart” is primarily to account for external
             data that comes at us from the world of internet and aligning them with internal
             corporate data to create meaningful insights, while keeping all rules of compliance
             intact. These rules of compliance include GDPR, CCPA, Financial rules like Basel III, Safe
             Harbor Act, and more.





























                                       FIGURE 11.7 Data from external sources.
   199   200   201   202   203   204   205   206   207   208   209