Font Size: a A A

Research Of Big Data Store Query Technology Based On HBase

Posted on:2016-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:W J FuFull Text:PDF
GTID:2308330473955262Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
On big data platform, storage requirements for unstructured data growing, its data read and write performance requirements are increasingly high, conventional techniques can not meet these requirements. Therefore, this thesis for large data processing technology research, selected HBase database of Hadoop as a platform. Meanwhile this paper optimizes data storage system and expands the secondary index function on HBase itself.HBase for data storage systems, it introducing goods picture, video information etc. occur serious delays. Therefore, this article analyzes this situation, and designs storage infrastructure to store large objects. The framework will isolate large object data stored in HDFS, avoid HBase itself mechanismsof Minor Compaction and Split, reduce the impact on the structure of the reading and writing data HBase performance. Then the corresponding file address updates to the column family of HBase big object and achieves fast query on large objects. Next column family for large object data customize Flush mechanism and Compaction mechanism to achieve the management and maintenance of large objects.By testing their performance of HBase improved and HBase. HBase improved, when you insert data, each record takes only milliseconds of time, and the state is very stable, the read speed is also increased by 2 times, this meets the needs of real-time online.Another focus of this paper expands secondary index function for HBase.Because HBase only supports queries based on the primary key,Non-primary key queries get the data only through the use of MapReduce framework or Scan scanner to scan a full table, the efficiency of these two methods is very low and can not meet the needs of real-time query.In response to this shortcoming, this paper extends the two indexing functions.Its realization idea is as follows.First, the task of establishing secondary indexes distribute on each server.Then the corresponding primary table and the index table store on the same server,Such a query simply establish a connection with the corresponding server to increase the non-primary key query speed. By comparing the test with HBase and indexing capabilities of HBase itself. Although the insertion data performance of HBase indexed reduces by 10%, but the query performance has greatly improved.Finally,we set up Hadoop + HBase + Zoo Keeper cluster testing environment in the laboratory, use log files of taobao a merchant product information as a data source,achieve a test comparison of the improved HBase and HBase itself. The end result is that the storage and query performance have made great improvement.
Keywords/Search Tags:Big Data, Hadoop, HBase, Storage Systems, Secondary Index
PDF Full Text Request
Related items