NotesFAQContact Us
Search Tips
ERIC Number: ED558322
Record Type: Non-Journal
Publication Date: 2013
Pages: 167
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-3032-4965-5
GraphStore: A Distributed Graph Storage System for Big Data Networks
Martha, VenkataSwamy
ProQuest LLC, Ph.D. Dissertation, University of Arkansas at Little Rock
Networks, such as social networks, are a universal solution for modeling complex problems in real time, especially in the Big Data community. While previous studies have attempted to enhance network processing algorithms, none have paved a path for the development of a persistent storage system. The proposed solution, GraphStore, provides an efficient and scalable graph management system for giant networks using Hadoop, a widely accepted distributed big data processing framework. Unlike the existing graph processing systems on Hadoop, the GraphStore offers a persistency through which query processing modules can be implemented. There arise several challenges with this study, such as skewed workload distribution and non-uniform network data distribution due to intrinsic properties like the power law degree distribution of networks. The remedies for these issues will also be addressed in this dissertation. The GraphStore advances the technology in network science through the following contributing merits: i. An efficient storage structure to store massive graphs on top of HDFS. ii. A novel Congestion Prevention Detection and Avoidance for Balancing (CPDAB) algorithm for clients to avoid and detect skewed workload from requests in HDFS. iii. A novel hierarchical-MapReduce (h-MapReduce) to address the significant skewed workload problem in processing networks due to the intrinsic power law degree distribution found in substantially large networks. Altogether, the proposed solutions make GraphStore a complete, one stop solution for graph storage and processing. Experiments demonstrated the GraphStore has significant gain over HBASE in regards to query response time with several real network datasets, including Twitter user-follower network and Wikipedia page links network. In addition, the SCAN algorithm for clustering networks is also ported to GraphStore and evaluated which further proves the usability of the system. Thereby, the proposed Graphstore enables researchers to process massive graphs on a limited cluster of commodity computers. The ability to warehouse Big Data networks creates opportunities to both the data mining research community and our society as a whole can benefit. The proposed system empowers Small and Medium Enterprises and small research groups with limited resources to process giant graphs for their purposes. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A