Font Size: a A A

Research And Application Of The Storage Of Hbase

Posted on:2015-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:X P FengFull Text:PDF
GTID:2298330467963847Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Today Internet has made tremendous progress. Now every day there are vast amounts of data to be uploaded to the Internet. With the increasing scale of data, a large number of business scenarios began to consider expanding the level of data storage, and then the storage services can be added and deleted. But the current relational database focuses more on a machine. The massive amount of data storage becomes a bottleneck, and a single machine can not load large amounts of data. HBase is a top Apache open source project that separated from Hadoop. As it has most of features of Google’s BigTable system and is implemented in Java, therefore it is very popular in the days full of data. HBase’s storage mechanism is different from traditional relational databases, which is based on column storage. It’s a nosql database and there is a great advantage in terms of massive data storage and query. Therefore, the study of storage mechanisms, and query features of HBase has great practical significance.From the storage mechanism of HBase, this paper made a deep study of its characters of loading data and query.The main tasks of this paper are as follows:First, this paper did some research of storage mechanism of HBase. In this paper, a detailed analysis of the storage mechanism of HBase is made. And from the data that is actually stored, the storage characteristics are analyzed. HBase completely modified the data storage format, but it requires more storage space in exchange for fast query efficiency.Then, this paper did some research of loading data into HBase. Before using HBase, you first need to load massive amounts of data into HBase. HBase itself has several different methods to load data, and every method has its own characteristics. HBase provides the interface which allows users to customize loading data into HBase. By using the MapReduce parallel computing framework, this paper realizes the parallel customize loading method and the method has good efficiency.Also, this paper did some research and analysis of query efficiency of HBase. HBase has the advantage of quick random query of massive data. But because it does not support SQL queries, it’s hard to meet the demand of complex business process. However, it provides the integrated interfaces of Hive which allows data to be stored in HBase but using Hive to query. This paper analyzes the query features of HBase and realizes the integration of HBase and Hive.Finally, this paper did some research and analysis of the integration of HBase and MapReduce. Compared to other non-relational databases, HBase’s biggest advantage is that it combined with Hadoop naturally. Hadoop is one of the most popular cloud computing technologies. With the interfaces of HBase, this paper realizes the integration of HBase and MapReduce which allows algorithms to read and write data of HBase directly.
Keywords/Search Tags:HBase, Hadoop, database, storage, loading data, query
PDF Full Text Request
Related items