Design And Implementation Of HBase Hierarchical Auxiliary Index System

Posted on:2020-09-18

Degree:Master

Type:Thesis

Country:China

Candidate:P Dang

Full Text:PDF

GTID:2428330602952552

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In the era of big data,traditional relational databases encounter enormous challenges brought by the explosive growth of data,leading to the difficulty of processing and implementing some business requirements that require large-scale storage and efficient retrieval.Due to its high availability,scalability,partitioned fault tolerance and so on,Hadoop series project has gradually become an effective solution for large-scale data computation and management.As a distributed non relational database based on Hadoop distributed file system HDFS,HBase has many advantages that relational databases do not have for mass data storage and management.However,with the deepening of research and practice,it is found that HBase needs to scan the entire data table for non primary key data retrieval,which takes a long time and costs a lot.This limits the application of HBase in many aspects.Referring to the idea of index in traditional database,researchers and engineers have done a lot of research on HBase non primary key data index.Based on summarizing and analyzing the difficulties and key problems faced by these studies,a HBase hierarchical auxiliary index system LB-Indexer based on log structure merging tree LSM-Tree and counting Bloom filter CBF is designed and implemented in this thesis.The main tasks are as follows:(1)According to the idea of LSM-Tree hierarchical model,the whole auxiliary index system is divided into two parts: memory buffer and persistent storage.The cache table with low memory space and low time complexity is used as the underlying data structure of index storage to ensure data writing and retrieval efficiency.It ensures the high availability and stability of index data in persistent storage by using the characteristics of HDFS distributed file system's extensible and redundant backup.For file block queries,CBF is used for efficient filtering,thereby shortening the retrieval time of index data.(2)Through the research of consistent hash algorithm,we design and implement the fragmentation mechanism of index data,and ensure the efficient retrieval and dynamic expansion of index cluster.The hook function of HBase coprocessor is used to capture data and its operation,so as to realize dynamic maintenance of index data.For the massive data already existing in HBase,the Map Reduce offline computing framework is applied to deal with it,so that the index can be constructed in batches quickly.(3)In the memory buffer layer,through the research of exponential smoothing method,this thesis proposes a more efficient algorithm for cold and hot data separation Hot Value,which is more efficient than the LRU cache elimination algorithm.This method achieves the purpose of cold and hot separation by calculating and ranking the heat value of the index data,optimizing the spatial structure of the memory and improving the hit rate of the cache data.(4)Through the big data test tool YCSB,we design several sets of test experiments,and test LB-Indexer from index writing,batch construction,dynamic construction and expansibility.It proves that the auxiliary index method proposed in this thesis can provide stable and efficient indexing services.Finally,by comparing the retrieval speed of LB-Indexer with native HBase,Hot Cols and thermal data,it shows that the method can improve the data retrieval efficiency of HBase non primary keys several times.

Keywords/Search Tags:

BigData, HBase Auxiliary Index, LSM-Tree, Counting Bloom Filter, Cache Policy

PDF Full Text Request

Related items

1	Hbase Non-primary Key Attribute Index Method And Implementation
2	The Research And Implementation Of Indexing And Query Techniques Based On HBase And In-memory Database
3	Content Synchronization In Distributed Systems
4	OBF-Index:A Distributed Multi-Dimensional Index Based On Ordinal Bloom Filter
5	Bloom Filter VS Weighted Bloom Filte
6	Research And Design Of Distributed Caching Policy On HBase
7	Research On Sampling Algorithm In Network Traffic Measurement
8	Multi-Bloom-Filter Query Algorithms And Their Applications
9	Research And Application Of Bloom Filter In Duplicated Webpages Deletion
10	Research On Cache Based Database Index