HBase Support The Storage And Query Of Graph Data

Posted on:2017-09-03

Degree:Master

Type:Thesis

Country:China

Candidate:H Chen

Full Text:PDF

GTID:2348330518494711

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of the Internet,the graph is used to describe many complex data objects and their relationships in the real world,including social networks,chemical structures,and computer vision.In addition,many Internet services such as e-commerce,search engines,knowledge maps are using the graph structure widely.Therefore,more and more attention are paid on graph-computing by industry.However,the accumulation of various types of data results in the rapid expanding of the size of graph data.In order to store these large-scale data for data mining which use the data to create value and provide users with better services,the industry in areas of data storage and computing have done deep theoretical research and made many engineering practices.Google has published a series of papers in its big data infrastructure such as GFS,BigTable and many other implementations.As the realization of these ideas,the open source projects Hadoop and HBase have drawn increasing attention,which are widely used to store large user data.These infrastructure can efficiently store files and unstructured data,but for the data which has strong coupling relationship,especially in the case of very large scale,how to store and query becomes a key issue.Faced with this problem,the thesis makes analysis of HDFS distributed file system,HBase distributed database and Spark graph computing engine deeply,an system based on HBase called G-HBase is designed to store and query graph data.This system uses HDFS as the underlying data storage ensuring high availability and excellent scalability,adopts HBase as the indexing layer to reach the required data efficiently.G-HBase provides the management of points and edges in the dimensions of property and time and effectively supports Spark,the graph computing engine in extraction and storage.This system has a good API interface that can integrate with Spark seamlessly,provide a good operating environment for data scientists,reduce the cost of data management and improve work efficiency.Compared to some open source data warehouse management system,such as Hive,G-HBase have conducted some optimizations for graph data.Finally,this thesis accomplished some performance tests of this system.The results of experiments confirm that G-HBase has better performance in the range query,compared to some solutions in the industry.

Keywords/Search Tags:

data warehouse, graph data, hbase

PDF Full Text Request

Related items

1	The Research On Big Graph Data Management Based On HBase
2	Hbase Based Credible Dataware Construction Of Business Quarterly And OLAP Query Analysis
3	Research And Implementation Of DSP Data Warehouse Optimization Based On Spark
4	The Design And Implementation Of A Data Service System
5	Research On Data Processing Technology Based On HBase
6	Research On Data Compression Technology Based On HBase
7	Research On Modeling ETL Process In Data Warehouse
8	The Research On HBase Data Recovery Technology Based On Data Storage Characteristic
9	The Research And Implementation Of The RDF Data Storage And Query Based On Graph
10	The Analysis And Optimization Of Load Balancing Algorithm For Big Data Platform Based On Hbase