Page 172 - Data Architecture
P. 172

Chapter 4.6: Textual Disambiguation




























               Fig. 4.6.7 A load utility.


           Document Fracturing/Named Value Processing



           There are many features to the actual processing done by textual disambiguation. But
           there are two primary paths of processing a document. These paths are called document

           fracturing and named value processing.

           Document fracturing is the process by which a document is processed—word by word—

           doing such processing as stop word processing, alternate spelling and acronym resolution,
           and homographic resolution. The effect of document fracturing is that upon processing,
           the document still has a recognizable shape, albeit in a modified form. For all practical
           purposes, it appears as if the document has been fractured.


           The second major type of processing that occurs is named value processing. Named value
           processing occurs when inline contextualization needs to be done. Inline
           contextualization is done where the text is repetitive, as sometimes occurs. When text is
           repetitive, it can be processed by looking for unique beginning delimiters and ending
           delimiters.


           There are other types of processing that can be done by textual disambiguation, but
           document fracturing and named value processing are the two primary analytic processing
           paths.


           Fig. 4.6.8 depicts the two primary forms of processing that occur in textual

                                                                                                               172
   167   168   169   170   171   172   173   174   175   176   177