Font Size: a A A

Research On The HBase Region Split And Load Balance Strategy

Posted on:2016-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z SuFull Text:PDF
GTID:2348330479953352Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The amount of information stored in the computer for explosive growth, Relational database in response to large-scale data and high concurrent when it is stretched, so in some application scenarios is gradually being replaced by non relational database.Research on non relational database has now become a new hot field. The open source HBase due to its high performance, high reliability, low cost and easy expansibility used by many Internet companies.In the incremental data using HBase, large-scale data loading is a relatively common scene. In this case, the main reason affecting the data loading speed is Region split into waiting.According to the research of current Region split and balance in HBase bulk load process is limited to the dynamic parameter, but not related to the changes in the structures of Region split. We design and implement a Region split strategy that based on the collection of cluster's scale size and information of load,estimate the splitting scale that the current system need, and accelerate the split in earlier stage of the data loading, stable split in the later stage. So it can accelerate the split process, and improve the throughput from the overall.And in order to better serve the bulk load process we also implements a load balance method which adjust the relation of Regionserver and Region based on node performance calculation and degree of hot or cold of Region.In the process of using Yahoo! Cloud Serving Benchmark tools to do a horizontal and vertical comparison test for the multiple split and load balance strategy, multi-split strategy improve performance of bulk load compared with HBase original strategy.Through vertical test data, we found how load data size, the number of threads and the cluster size have an effect on our strategy, and their mutual restriction.In addition, the performance of the multi split strategy is also tested under the constant and upper bound strategy.
Keywords/Search Tags:big data, HBase, multiple split, load balance, bulk load
PDF Full Text Request
Related items