Research On Data Extraction And Distributed Graph Data Management

Posted on:2017-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:Q L Ding

Full Text:PDF

GTID:2278330488466894

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Graph database is a member of the family of NoSQL, which has a unique advantage in dealing with the complex data that has more correlation between each other, it provides quick query and efficient utilization of big data which has the structural characteristics of similar to the graph.How to quickly extract-transform-load (ETL) relational data to graph data, how to effi-cient analysis and use of these graph data, are two important problems in research of graph data application. Although there are some domestic and international research about ETL, however, these research has problems such as:1) The converted graph data were of poor qual-ity; 2) the efficiency of transforming was low; 3) the transformed results were not suitable for distributed storage. In the aspects of efficiently analyzing and utilizing graph data, most methods has insufficiency in graph data distributed storage and distributed computing.Therefore, this thesis focus on improving the design of ETL method, efficient manage-ment of large-scale graph data. The key contributions are as follows:(1) To overcome these limitations about current ETL method, a sub-schema-based ETL method for transforming relational data to graph data was proposed. By splitting schema of relational database to several sub-schemas, this method improved the algorithm and procedure of traditional ETL method and provided an efficient method for parallel ETL. The trans-formed results can satisfy the requirements of distributed storage, and conduct to be the basis data for Spark GraphX computing framework.(2) Considering the complex graph data, this article designed a distributed graph data method based on graph database, which can management distributed storage and scheduling of distributed computing framework for data analysis.Finally, J2EE and Neo4j were applied to implement the ETL prototype system for ex-perimental verification (referred to as:BSS-ETLS), Neo4j and Spark GraphX are used to realize the prototype system (referred to as:GCDMS). Experimental results show that hat the improved ETL method yielded better performance than traditional methods; GCDMS has ob-vious advantages in dealing with the massive graph data which has the strong structure.

Keywords/Search Tags:

Graph Database, Distributed Computing, BSS-ETLS, GCDMS

PDF Full Text Request

Related items

1	Hybrid Graph Query And Graph Computing Engine For Distributed Graph Database
2	Distributed Data Process In Graph Database
3	Design And Implementation Of Critical Technologies In Distributed Graph Database
4	Design And Implementation Of Distributed Graph Computing Engine
5	Graph Reachability Distributed Computing And Application Based On Spark
6	Distributed Graph Database Storage Layer Design And Implementation
7	Design And Implementation Of Graph Computing Platform Framework Based On Graph Database
8	Conducting Graph Analytics In Graph Database Systems By Using GPU-based Accelerators
9	PyGel:a Distributed Graph Computing Engine Based On DPark
10	Research On Distributed Graph Computing Performance Optimization For Natural Graphs