Page 75 - Building Big Data Applications
P. 75

Chapter 2   Infrastructure and technology  69


                 document will contain all the data it needs to answer specific query questions. Benefits
                 of this model include the following:
                   Ability to store dynamic data in unstructured or semistructured or structured
                   formats.
                   Ability to create persisted views from a base document and storing the same for
                   analysis
                   Ability to store and process large data sets.


                 The design features of document-oriented databases
                 include the following:

                   Schema freedthere is no restriction on the structure and format of how the data
                   needs to be stored. This flexibility allows an evolving system to add more data and
                   allows the existing data to be retained in the current structure.
                   Document storedObjects can be serialized and stored in a document, there is no
                   relational integrity to enforce and follow.
                   Ease of creation and maintenancedA simple creation of the document allows
                   complex objects to be created once and there is minimal maintenance once the
                   document is created.
                   No relationship enforcementdDocuments are independent of each other and
                   there is no foreign key relationship to worry when executing queries. The effects of
                   concurrency and performance issues related to the same are not a bother here.
                   Open formatsdDocuments are described using JSON or XML or some derivative,
                   making the process standard and clean from the start.
                   Built-in versioningdDocuments can get large and messy with versions. To avoid
                   conflicts and keep processing efficiencies, versioning is implemented by most solu-
                   tions available today.

                   Document databases express the data as files in JSON or XML formats. This allows the
                 same document to be parsed for multiple contexts and the results scrapped and added to
                 the next iteration of the database data.
                   Example usagedA document database can be used to store the results of clicks on the
                 web. For each log file that is parsed a simple XML construct with the Page_Name,
                 Position_Coordinates, Clicks, Keywords, Incoming and Outgoing site and date_time will
                 create a simple model to query the number of clicks, keywords, date, and links. This
                 processing power cannot be found in an RDBMS. If you want to expand and capture the
                 URL data, the next version can add the field.
                   The emergence of document databases is still ongoing at the time of this book (2012)
                 and the market adoption for this technology will happen soon. We will discuss the
                 integration architecture for this technology in the latter half of this book.
   70   71   72   73   74   75   76   77   78   79   80