Font Size: a A A

The Research And Implementation Of Distributed Message System Performance Optimization

Posted on:2020-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z H XuFull Text:PDF
GTID:2428330602450555Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of financial applications,sensor networks and other fields,new data is generated every minute and second.In order to obtain analytical results in real time from massive amounts of new data,many large Internet companies use stream computing to process the data.As the underlying communication backbone of flow computing,distributed message system is widely adopted.With the increase of data volume,the performance problem of distributed message system is exposed gradually.To better support different application scenarios,distributed messaging systems provide a large number of parameters that can be configured.However,it is difficult for most users to configure these parameters to improve the performance of distributed messaging systems.There are some performance optimization methods to improve the data processing capacity of distributed messaging system.However,in a scenario with a small sample size and limited time,the traditional performance optimization method is not effective.How to optimize the performance of distributed messaging systems in a limited optimization time has become an urgent problem for industry and academia.To solve the performance optimization problem mentioned above,this paper proposes an automated performance optimization method by analyzing and modeling the distributed message system.Based on the experimental samples,this method constructs a performance comparison model based on random forests,and designs an optimal configuration search algorithm based on the sorting ability of the model on the configuration set.The algorithm can search for configurations that make the system perform better within a limited time constraint.The research content of this paper is divided into the following aspects:(1)Defining the performance optimization problem of distributed messaging system.The performance of distributed message system is modeled,and the application scenarios,operating environment,system configuration and other performance influencing factors are defined by mathematical formula.This paper analyzes the performance optimization problem of distributed message system and the shortcomings of existing solutions.Finally,the performance optimization problem is mathematically modeled,and a clear definition of the performance optimization problem is given.(2)Building a performance comparison model based on random forests.The configuration parameters of the distributed message system are preprocessed and sampled,and the actual cluster is used for throughput experiments to initialize the training samples.The random forest model is selected to learn the training samples,and a performance model that compares throughput relationship is built.The accuracy of the performance comparison model is verified by using the ranking accuracy formula.(3)Design and implementation of search-based performance optimization method.Based on the performance comparison model and the weighted Latin hypercube sampling method,a configuration optimization method based on search is implemented.The method combine the global search algorithm and the local exploration algorithm in a limited optimization time,select the optimal configuration parameter combination,and input the candidate configuration set into the actual cluster for throughput test.Then the throughput obtained by the test is iterated again in the optimization method.After the optimization time is over,the current optimal configuration parameters are output.In the experimental part of this paper,the same training set is used to train the traditional performance prediction model and the performance comparison model based on random forest,and the ranking accuracy of the two models on the test set is compared.In the eight different application scenarios,the performance optimization methods proposed in this paper are verified and analyzed by using five comparison algorithms including Random,Best Config,RFHOC,Hyperopt and SMAC.The experimental results show that the throughput obtained by the performance optimization method in this paper has a 21% improvement compared with the comparison algorithm.
Keywords/Search Tags:Performance optimization, Machine learning, Distributed message system
PDF Full Text Request
Related items