Page 146 - Data Architecture
P. 146

Chapter 4.3: Parallel Processing
           Fig. 4.3.8 shows the parsing of nonrepetitive data.






























               Fig. 4.3.8 Parsing nonrepetitive data.


           The parsing of nonrepetitive is an entirely different matter than the parsing of repetitive
           data. In fact, the term—“parsing of nonrepetitive data”—is often referred to as textual
           disambiguation. There is much more to the reading of nonrepetitive data than merely

           parsing it.

           However it is done, nonrepetitive data are read and turned into a form that can be

           managed by a database management system.

           There is a very good reason why nonrepetitive data require well beyond a parsing

           algorithm. The reason is that context in nonrepetitive data hides in many and complex
           forms. For that reason, textual disambiguation is usually done external to the
           nonrepetitive data in big data. (In other words, because of the inherent complexity of
           nonrepetitive data, textual disambiguation is done outside of the database system that
           manages big data.)


           A related issue to parallel processing in the big data environment is that of the efficiency
           of queries. As seen in Fig. 4.3.6, when a simple query is done against big data, the parsing
           of the entire set of data contained in big data must be parsed. Even though the data are
           managed in parallel, such a full database scan of data causes many machine resources to
           be used.


           An alternate approach is to scan the data once and create a separate index. This approach
                                                                                                               146
   141   142   143   144   145   146   147   148   149   150   151