Page 170 - Data Architecture
P. 170

Chapter 4.6: Textual Disambiguation






























               Fig. 4.6.5 Iterative development.



           Input/Output



           The input to the process of textual disambiguation is electronic text. There are MANY
           forms of electronic text. Indeed, electronic text can come from almost anywhere. The
           electronic text can be in the form of proper language, slang, shorthand, comments,
           database entries, and many other forms. Textual disambiguation needs to be able to

           handle all the forms of electronic text. In addition, electronic text can be in different
           languages.


           Textual disambiguation can handle nonelectronic text after the nonelectronic text passes
           through an automated capture mechanism such as optical character recognition (OCR)
           processing.


           The output of textual disambiguation can take many forms. The output of textual
           disambiguation is output that is created in a “flat file format.” As such, the output can be
           sent to any standard DBMS or to Hadoop.


           Fig. 4.6.6 shows the types of output that can be created from textual disambiguation.








                                                                                                               170
   165   166   167   168   169   170   171   172   173   174   175