Research Of Big Data Store Query Technology Based On HBase

Posted on:2016-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:W J Fu

Full Text:PDF

GTID:2308330473955262

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

On big data platform, storage requirements for unstructured data growing, its data read and write performance requirements are increasingly high, conventional techniques can not meet these requirements. Therefore, this thesis for large data processing technology research, selected HBase database of Hadoop as a platform. Meanwhile this paper optimizes data storage system and expands the secondary index function on HBase itself.HBase for data storage systems, it introducing goods picture, video information etc. occur serious delays. Therefore, this article analyzes this situation, and designs storage infrastructure to store large objects. The framework will isolate large object data stored in HDFS, avoid HBase itself mechanismsof Minor Compaction and Split, reduce the impact on the structure of the reading and writing data HBase performance. Then the corresponding file address updates to the column family of HBase big object and achieves fast query on large objects. Next column family for large object data customize Flush mechanism and Compaction mechanism to achieve the management and maintenance of large objects.By testing their performance of HBase improved and HBase. HBase improved, when you insert data, each record takes only milliseconds of time, and the state is very stable, the read speed is also increased by 2 times, this meets the needs of real-time online.Another focus of this paper expands secondary index function for HBase.Because HBase only supports queries based on the primary key,Non-primary key queries get the data only through the use of MapReduce framework or Scan scanner to scan a full table, the efficiency of these two methods is very low and can not meet the needs of real-time query.In response to this shortcoming, this paper extends the two indexing functions.Its realization idea is as follows.First, the task of establishing secondary indexes distribute on each server.Then the corresponding primary table and the index table store on the same server,Such a query simply establish a connection with the corresponding server to increase the non-primary key query speed. By comparing the test with HBase and indexing capabilities of HBase itself. Although the insertion data performance of HBase indexed reduces by 10%, but the query performance has greatly improved.Finally,we set up Hadoop + HBase + Zoo Keeper cluster testing environment in the laboratory, use log files of taobao a merchant product information as a data source,achieve a test comparison of the improved HBase and HBase itself. The end result is that the storage and query performance have made great improvement.

Keywords/Search Tags:

Big Data, Hadoop, HBase, Storage Systems, Secondary Index

PDF Full Text Request

Related items

1	Research And Development Of Big Data Storage Systems Based On Hbase
2	Research On GNSS Data Storage And Retrieval Based On HBASE
3	The Research And Implementation Of Indexing And Query Techniques Based On HBase And In-memory Database
4	Design And Implication Of Mini-files Storage System Based On Hbase
5	Research And Application Of The Storage Of Hbase
6	Application And Research On Data Storage Of Rail Transit Maintenance Support System Based On Hadoop
7	Research Data Storage Index Mechanism Massive GML Space Ambient Cloud
8	Research And Implementation Of Disaster Big Data Management Methods Based On Cloud Computing
9	Research On Time Series Data Computing And Visualization Of Sensor Networks
10	The Design And Implementation Of Real-Time Query System For Mass Data Based On Hbase