Font Size: a A A

HBase Support The Storage And Query Of Graph Data

Posted on:2017-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2348330518494711Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the Internet,the graph is used to describe many complex data objects and their relationships in the real world,including social networks,chemical structures,and computer vision.In addition,many Internet services such as e-commerce,search engines,knowledge maps are using the graph structure widely.Therefore,more and more attention are paid on graph-computing by industry.However,the accumulation of various types of data results in the rapid expanding of the size of graph data.In order to store these large-scale data for data mining which use the data to create value and provide users with better services,the industry in areas of data storage and computing have done deep theoretical research and made many engineering practices.Google has published a series of papers in its big data infrastructure such as GFS,BigTable and many other implementations.As the realization of these ideas,the open source projects Hadoop and HBase have drawn increasing attention,which are widely used to store large user data.These infrastructure can efficiently store files and unstructured data,but for the data which has strong coupling relationship,especially in the case of very large scale,how to store and query becomes a key issue.Faced with this problem,the thesis makes analysis of HDFS distributed file system,HBase distributed database and Spark graph computing engine deeply,an system based on HBase called G-HBase is designed to store and query graph data.This system uses HDFS as the underlying data storage ensuring high availability and excellent scalability,adopts HBase as the indexing layer to reach the required data efficiently.G-HBase provides the management of points and edges in the dimensions of property and time and effectively supports Spark,the graph computing engine in extraction and storage.This system has a good API interface that can integrate with Spark seamlessly,provide a good operating environment for data scientists,reduce the cost of data management and improve work efficiency.Compared to some open source data warehouse management system,such as Hive,G-HBase have conducted some optimizations for graph data.Finally,this thesis accomplished some performance tests of this system.The results of experiments confirm that G-HBase has better performance in the range query,compared to some solutions in the industry.
Keywords/Search Tags:data warehouse, graph data, hbase
PDF Full Text Request
Related items