Font Size: a A A

The Research Of HBase Compaction And Bucket Cache Mechanism

Posted on:2017-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y P WangFull Text:PDF
GTID:2348330533450155Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rise of the concept of Big Data, HBase with high reliability, scalability, concurrent processing capability and other characteristics, has been widely used and concerned in commercial applications and technology research fields. HBase is a distributed column-based database, which is of flexible data model and based on HDFS file system. Its capability of storing, processing and analyzing big data is far more than the traditional relational database. HBase provides an effective big data management solution for enterprises and users.This thesis takes HBase as the research object, analyzing its storage architecture, deeply studying the HBase storage mechanism. Through the analysis of part of the source code, the BucketCache and Compaction mechanism of HBase storage have been emphatically analyzed. This thesis mainly completes the following work:Firstly, all operations in HBase database are written in an additional data mode. Its Compaction mechanism read and write data from HDFS, which occupies extensive resources, therefore affects system read performance. Aiming at sloving this problem, the data redundancy based Compaction algorithm has been put forward. By compacting the column family files which whose ratio of deleted data equals the threshold, the algorithm can reduce disk I/O because it reduce the number of files while cleaning useless data, which in turn promote the system read performance. Compared with the original HBase Compaction mechanism, which only considers the size and number of files and time interval, experiments results indicates the presented algorithm can upgrade system performance and enhance HBase Major Compaction capability.Secondly, in the HBase BucketCache implementation scheme, dynamic division bucket will choose a full empty bucket from all buckets to divide. When the system needs much specified size bucket to cache, the dynamic division will be quite frequent, which will affect the system read performance. To solve this problem, bucket allocation strategy with minimum partition has been proposed. The strategy selects the least number of cache partitioning bucket to dynamically divide. The experiments proved that the improved strategy can reduce the response time of reading.The research results show that the research and improvement of HBase Compaction mechanism and BucketCache mechanism can effectively reduce the HBase disk I/O, and improve the system efficiency of reading, so as to promote the comprehensive performance of the system.
Keywords/Search Tags:HBase, Compaction mechanism, data redundancy, BucketCache mechanism, least division
PDF Full Text Request
Related items