Page 75 - Building Big Data Applications

P. 75

Chapter 2 Infrastructure and technology 69

document will contain all the data it needs to answer speciﬁc query questions. Beneﬁts
of this model include the following:
Ability to store dynamic data in unstructured or semistructured or structured
formats.
Ability to create persisted views from a base document and storing the same for
analysis
Ability to store and process large data sets.

The design features of document-oriented databases
include the following:

Schema freedthere is no restriction on the structure and format of how the data
needs to be stored. This ﬂexibility allows an evolving system to add more data and
allows the existing data to be retained in the current structure.
Document storedObjects can be serialized and stored in a document, there is no
relational integrity to enforce and follow.
Ease of creation and maintenancedA simple creation of the document allows
complex objects to be created once and there is minimal maintenance once the
document is created.
No relationship enforcementdDocuments are independent of each other and
there is no foreign key relationship to worry when executing queries. The effects of
concurrency and performance issues related to the same are not a bother here.
Open formatsdDocuments are described using JSON or XML or some derivative,
making the process standard and clean from the start.
Built-in versioningdDocuments can get large and messy with versions. To avoid
conﬂicts and keep processing efﬁciencies, versioning is implemented by most solu-
tions available today.

Document databases express the data as ﬁles in JSON or XML formats. This allows the
same document to be parsed for multiple contexts and the results scrapped and added to
the next iteration of the database data.
Example usagedA document database can be used to store the results of clicks on the
web. For each log ﬁle that is parsed a simple XML construct with the Page_Name,
Position_Coordinates, Clicks, Keywords, Incoming and Outgoing site and date_time will
create a simple model to query the number of clicks, keywords, date, and links. This
processing power cannot be found in an RDBMS. If you want to expand and capture the
URL data, the next version can add the ﬁeld.
The emergence of document databases is still ongoing at the time of this book (2012)
and the market adoption for this technology will happen soon. We will discuss the
integration architecture for this technology in the latter half of this book.

70 71 72 73 74 75 76 77 78 79 80