Font Size: a A A

Research On Hadoop Configuration Optimization Based On Improved Particle Swarm Optimization Algorithm

Posted on:2022-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:W JiFull Text:PDF
GTID:2518306479471814Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The performance of the Hadoop distributed system framework to run MapReduce is affected by the hardware resources and configuration parameters of components in the cluster.How to utilize the limited hardware resources and give full play to the maximum performance of the cluster operation,it is necessary to find a reasonable and effective cluster configuration parameter.The complex configuration combination of Hadoop cluster is regarded as a problem to be optimized,and an accurate model construction method and efficient parameter optimization algorithm are sought,which can make good use of limited resources and improve the performance of cluster operation.Based on the analysis of the existing Hadoop cluster configuration optimization framework and the application of swarm intelligence optimization algorithm in complex combinatorial optimization problems,this paper proposes a prototype Hadoop configuration optimization framework based on improved particle swarm optimization algorithm.The framework has two main components:Performance predictor to build the MapReduce job execution phase model and parameter optimizer to optimize Hadoop cluster configuration.Through the detailed division of MapReduce operation stages,a performance model of each stage of MapReduce operation execution based on random forest regression algorithm is designed with the design idea of formula derivation and performance prediction.The performance model of each stage of MapReduce operation and the corresponding cluster configuration parameters can be obtained through the performance predictor.With the performance model outputted by the performance predictor,the fitness value function and the corresponding cluster configuration parameters required by the particle swarm optimization algorithm can be deduced as the initial position of the population.However,the analysis shows that the particle swarm optimization algorithm is prone to the problems of poor population diversity and premature convergence in the process of searching.In order to get better results of cluster configuration optimization,this paper proposes a particle replacement two-population comprehensive learning PSO algorithm(PP-CLPSO).The algorithm with adaptive inertia weight and population Logistic chaotic population design double population system,by the double-population particle number mechanism form a particle relations with and with a number,the adaptive inertia weight population as observation,when the population trap particles trapped in local optimum,chaos will population dynamic execution of the particle displacement operation,In order to increase the diversity of the population,the two strategies of particle synthesis contract learning and local learning after particle replacement are used for global exploration and local search,which effectively improves the accuracy of searching for the optimal solution.Using the PP-CLPSO algorithm as the parameter optimizer of the Hadoop cluster configuration optimization framework,more effective cluster configuration parameters can be found in the search space.Hadoop configuration optimization framework based on PP-CLPSO algorithm is called PPCLPSO-HCOF.In order to verify the accuracy of MapReduce modeling of PPCLPSO-HCOF performance predictor and the optimization performance of parameter optimizer,experimental comparison was adopted to compare the two model construction methods with input data sets of different sizes.The results show that the accuracy of PPCLPSOHCOF model construction is higher.Compared with the four particle swarm optimization algorithms and the four swarm intelligence optimization algorithms,the PP-CLPSO algorithm has higher solution accuracy and convergence speed in the single and multimodal test functions.Finally,the optimized cluster configuration parameters of PPCLPSO-HCOF are used to achieve more efficient Hadoop cluster job execution performance on MapReduce program.
Keywords/Search Tags:Hadoop, configuration optimization, PSO, MapReduce
PDF Full Text Request
Related items