Page 147 - Data Architecture
P. 147

Chapter 4.3: Parallel Processing
           works only for repetitive data, not nonrepetitive data. Once the index for the repetitive
           data is created, it can be scanned much more efficiently than doing a full table scan.
           Once the index is created, there no longer is a need to do a full table scan every time big
           data needs to be searched.


           Of course, the index must be maintained. Every time data are added to the big data
           collection of repetitive data, an update to the index is required.


           In addition, the designer must know what contextual information is available at the
           moment of the building of the index.


           Fig. 4.3.9 shows the building on an index from the contextual data found on repetitive
           data.



































               Fig. 4.3.9 Building an index on repetitive data.


           One of the issues of creating a separate index on data found in repetitive data is that the
           index that is created is application-specific. The designer must know what data to look
           for before the index is built.


           Fig. 4.3.10 displays the application-specific nature of building an index for repetitive data
           in big data.




                                                                                                               147
   142   143   144   145   146   147   148   149   150   151   152