Page 76 - Building Big Data Applications
P. 76
70 Building Big Data Applications
Graph databases
Social media and the emergence of Facebook, LinkedIn, and Twitter have accelerated the
emergence of the most complex NoSQL database, the graph database. The graph
database is oriented toward modeling and deploying data that is graphical by construct.
For exampledto represent a person and their friends in a social network, we can either
write code to convert the social graph into keyevalue pairs on a Dynamo or Cassandra or
simply convert them into a node-edge model in a graph database, where managing the
relationship representation is much more simplified.
A graph database represents each object as a node and the relationships as an edge, this
meanspersonisanodeandhouseholdisanode,the relationshipbetweenthe twoisanedge.
Data modeldlike the classic ER model for RDBMS, we need to create anattribute
model for a graph database. We can start by taking the highest level in a hierarchy as a
root node (akin to an Entity) and connect each attribute as its subnode. To represent
different levels of the hierarchy we can add a subcategory or subreference and create
another list of attributes at that level. This creates a natural traversal model like a tree
traversal, which is similar to traversing a graph. Depending on the cyclic property of the
graph, we can have a balanced or skewed model. Some of the most evolved graph da-
tabases include Neo4J, infiniteGraph, GraphDB, and AllegroGraph.
There are additional Hadoopcommitters and distributors like MapR and these ar-
chitectures will be covered in the Appendix.
AS we conclude this chapter we see the different technologies that are available to
process big data, their specific capabilities, and their architectures. In the next chapter
we will study some use cases from real life implementations of solutions. In the second
half of this book we will see how these technologies will enrich the data warehouse and
data management with big data integration.
For continued reading on specific vendors for NoSQL databases, please check their
websites.
Additional reading
Hive A Data Warehousing Solution Over a MapReduce Framework - Facebook Data Infrastructure Team.
Apache Software Foundation Page.
Pavlo et al. A Comparison of Approaches to Large-Scale Data Analysis. Proc. ACM SIGMOD, 2009.
C. Ronnie et al. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proc. VLDB
Endowement 1(2):1265e1276, 2008.
J. Dean and S. Ghemawat. MapReduce: a data processing tool. Communications of the ACM, 53(1):
72e77, 2010.
D.J. DeWitt and M. Stonebraker. MapReduce: A major step backwards. The Database Column, 1, 2008.
E. Friedman, P. Pawlowski, and J. Cieslewicz. SQL/MapReduce: A practical approach to self-describing,
polymorphic, and parallelizable user-defined functions. Proceedings of the VLDB Endowment, 2(2):
1402e1413, 2009.