Font Size: a A A

Research And Implementation Of Performance Tuning Method Of A Distributed Storage System Named Hbase

Posted on:2019-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhangFull Text:PDF
GTID:2428330572455927Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the amount of data in the database has grown rapidly,and the amount of data access has also gradually increased?At a result,the response time of system functions has become slower and slower.In an actual system,the response time of some functions becomes larger as the data size increases,and the response time of date insertion and selection becomes larger and larger.Therefore,how to reduce the response time of the system functions by optimizing the performance of HBase becomes an important issue of the system,and it's important and urgent to model and optimize the databases performance in industry and academia.Aiming at the above mentioned database performance optimization problem,this thesis investigates and summarizes the domestic and foreign research status of performance optimization of non-relational databases to choose HBase as the research object and implement a HBase performance tuning method.Based on a large amount of experimental data,this thesis analyzes the relationship between performance and characteristics of HBase to conduct the performance prediction models by the random forest algorithm,and then proposes an improved genetic algorithm to optimize HBase performance on basis of the prediction models.Finally,a HBase performance tuning method is implemented.In particular,the specific concepts of this thesis as follows.(1)Selecting HBase characteristics and generating training samples.Based on the description in HBase official documents,the research of characteristics of HBase is divided into two steps and we gain the characteristics of HBase which are related to the performance.Then,according to the orthogonal experimental design method to select a representative feature sample sets,the experimental results of HBase performance are gained by conducting a large amount of experiments.(2)Building the performance prediction models.By comparing several machine learning algorithms,this thesis uses the random forest and the training samples to get the importance of the features of HBase to select featrues based on the prediction models.Finally,the prediction models are gained.(3)Designing and implementing the performance optimization algorithm based on the prediction models.This thesis designs a fitness function on the basis of the throughput prediction model and latency prediction model,and then improves the crossover step.At the same time,the cut-based roulette method and the adaptive mutation operation are adopted in this thesis,and finally the improved genetic algorithm is implemented.The HBase performance is optimized by the improved genetic algorithm proposed in this thesis,and then the optimal solution of HBase performance and the corresponding optimal parameter configuration are obtained.We carry out the experiments to make sure the prediction models and the proposed improved genetic algorithm are effective and correct.Experimental samples are obtained on four typical applications from Yahoo! Cloud Serving Benchmark.Based these experimental samples,this thesis applies the random forest and 3 machine learning algorithms to build the prediction models separately,and then compares the error rate of the prediction models to verify the accuracy of the prediction models by 150 testing samples.Then,the HBase performance will be optimized separately according to the improved genetic algorithm and 3 optimization algorithms,and the optimization results are compared.The HBase performance tuning method proposed in this paper is applied to the actual system,and the response time of functions before and after the optimization is compared and analyzed.The specific reason why the optimized HBase parameter configuration is analyzed to improve its performance is given.The HBase performance optimization method proposed in this thesis is applied to the actual system,and the response time of functions is compared and analyzed before and after the optimization.The optimized configuration of HBase parameters is given and the specific reasons for its performance improvement are analyzed.It is finally proved that the HBase performance tuning method proposed in this thesis is efficient,and it can optimize the HBase performance in the actual systems.
Keywords/Search Tags:Performance Optimization, Random Forest, Genetic Algorithm, HBase
PDF Full Text Request
Related items