Font Size: a A A

Research On HBase Query And Index Mechanism

Posted on:2017-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:H QinFull Text:PDF
GTID:2348330533950148Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
HBase can better deal with the challenges brought by the mass of unstructured data as a non-relational database. The underlying file system of it is HDFS which has strong concurrency, easy scalability and high reliability. Under the support of HDFS,the performance of HBase on parallel processing capabilities, data compression,storage of mass information is far beyond the traditional relational database. So it has been widely used.But index design of HBase has significant limitations, in which one important point is that index design of HBase itself only supports RowKey retrieval with the result that the client applications often perform a full table scan in order to achieve some simple tasks. This problem was solved by Huawei's HBase secondary indexing scheme HIndex. But Hindex is not efficient when retrieving based on heat data. In addition, the strategy of HBase scanning queue management does not consider that the large number of users have more scanning need for some StoreFile. To deal with these problems, foreign scholars have made a lot of improved programs on HBase programs,but these programs have their own shortcomings on applicability, stability and efficiency. Therefore, it is necessary to study the query and index mechanism of HBase,thereby improving HBase retrieval efficiency.This paper uses HBase as the research object and conducts in-depth research on HBase queries, indexing mechanism and specially Huawei HBase secondary indexing scheme named Hindex. By analyzing part of source code of HBase and HIndex, this paper proposes the optimization mechanism aiming at the lack of existing mechanisms.The main work is as follows:1. Through learning implementation details of HIndex, this paper proposes a secondary index mechanism based on the heat value and uses caching strategy based on the heat index to store the secondary index of the heat data aiming at improvingthe low query efficiency when a large number of users query for heat data. Theexperimental results show that the strategies can effectively reduce the query response time and improve the cache hit rate.2. HBase need to targetto specific StoreFile when accessing the data, and then load the StoreFile to the scan queue.According locality characteristics when the users accessing data, there may be a greater demand on the part of the scanning StoreFile.This paper study the scanning management strategies of HBase for StoreFile,and analyze the deficiencies of the StoreFile scanning management strategy. What's more, it improves StoreFile management strategies based on the idea of meeting more user need in unit time. The results show that after the improvement, it can meet the demand for more queries in unit time.The experimental results show that theheat indexstrategy of HIndex and caching strategy based on the heat index can effectively improve thequery efficiency when the thermal data is relatively concentrated. When users have multiple queries over a period of time, the improved scanning load queue strategy can meet the need of more queries in a unit time.
Keywords/Search Tags:HBase, secondary index, heat value, caching strategy, scanning queue
PDF Full Text Request
Related items