Page 76 - Building Big Data Applications
P. 76

70 Building Big Data Applications


             Graph databases

             Social media and the emergence of Facebook, LinkedIn, and Twitter have accelerated the
             emergence of the most complex NoSQL database, the graph database. The graph
             database is oriented toward modeling and deploying data that is graphical by construct.
             For exampledto represent a person and their friends in a social network, we can either
             write code to convert the social graph into keyevalue pairs on a Dynamo or Cassandra or
             simply convert them into a node-edge model in a graph database, where managing the
             relationship representation is much more simplified.
                A graph database represents each object as a node and the relationships as an edge, this
             meanspersonisanodeandhouseholdisanode,the relationshipbetweenthe twoisanedge.
                Data modeldlike the classic ER model for RDBMS, we need to create anattribute
             model for a graph database. We can start by taking the highest level in a hierarchy as a
             root node (akin to an Entity) and connect each attribute as its subnode. To represent
             different levels of the hierarchy we can add a subcategory or subreference and create
             another list of attributes at that level. This creates a natural traversal model like a tree
             traversal, which is similar to traversing a graph. Depending on the cyclic property of the
             graph, we can have a balanced or skewed model. Some of the most evolved graph da-
             tabases include Neo4J, infiniteGraph, GraphDB, and AllegroGraph.
                There are additional Hadoopcommitters and distributors like MapR and these ar-
             chitectures will be covered in the Appendix.
                AS we conclude this chapter we see the different technologies that are available to
             process big data, their specific capabilities, and their architectures. In the next chapter
             we will study some use cases from real life implementations of solutions. In the second
             half of this book we will see how these technologies will enrich the data warehouse and
             data management with big data integration.
                For continued reading on specific vendors for NoSQL databases, please check their
             websites.

             Additional reading

             Hive A Data Warehousing Solution Over a MapReduce Framework - Facebook Data Infrastructure Team.
             Apache Software Foundation Page.
             Pavlo et al. A Comparison of Approaches to Large-Scale Data Analysis. Proc. ACM SIGMOD, 2009.
             C. Ronnie et al. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proc. VLDB
                Endowement 1(2):1265e1276, 2008.
             J. Dean and S. Ghemawat. MapReduce: a data processing tool. Communications of the ACM, 53(1):
                72e77, 2010.
             D.J. DeWitt and M. Stonebraker. MapReduce: A major step backwards. The Database Column, 1, 2008.
             E. Friedman, P. Pawlowski, and J. Cieslewicz. SQL/MapReduce: A practical approach to self-describing,
                polymorphic, and parallelizable user-defined functions. Proceedings of the VLDB Endowment, 2(2):
                1402e1413, 2009.
   71   72   73   74   75   76   77   78   79   80   81