Page 173 - Data Architecture
P. 173

Chapter 4.6: Textual Disambiguation
           disambiguation.














































               Fig. 4.6.8 The two main processing components of textual ETL.



           Preprocessing a Document



           On occasion, it is necessary to preprocess a document. On occasion, the text of a
           document cannot be processed in a standard fashion by textual disambiguation. In these
           circumstances, it is necessary to pass the text through a preprocessor. In the
           preprocessor, the text can be edited to alter the text to the point that the text can be
           processed in a normal manner by textual disambiguation.


           As a rule, you don’t want to preprocess text unless you absolutely have to. The reason
           why you don’t want to have to preprocess text is that by preprocessing text, you
           automatically double (or more!) the machine cycles that are required to process the text.


           Fig. 4.6.9 shows that—if necessary—electronic text can be preprocessed.

                                                                                                               173
   168   169   170   171   172   173   174   175   176   177   178