Font Size: a A A

Research On Distributed Boosting Algorithm And Its Application In Image Target Detection

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:2428330620963967Subject:Engineering
Abstract/Summary:PDF Full Text Request
Boosting algorithm is a kind of machine learning algorithm which can improve the classification accuracy by integrating weak learners.it is widely used since its good generalization performance,high efficiency and ease of use.However,when faced with a large-scale dataset,the traditional centralized training of a single machine will consume a lot of time,and it is also a challenge to the hardware memory resources.It is an effective way to extend the standard Boosting algorithm of a single machine to a distributed environment to reduce training time and alleviate the pressure of single machine memory.This paper first introduces two existing distributed Boosting algorithms and analyzes their advantages and disadvantages,then proposes a distributed Boosting algorithm based on representative subset strategy,and then conducts the performance experiment of the algorithm on the Spark distributed computing platform.Finally,the idea of the algorithm is combined with the face detection method based on AdaBoost algorithm.The main contents of this paper are as follows:Firstly,two existing distributed Boosting algorithms,DistBoost and PreWeak,are introduced.Then the problems that the local training of the former is easy to overfit and the latter requires a large number of communication times is analyzed.Secondly,based on the previous research results on the relationship between game theory and Boosting algorithm,a distributed Boosting algorithm based on representative subset is proposed,which can effectively alleviate local training overfitting problem,and reduce the number of communication between nodes.The basic idea is to select a small representative subset from the training sample set of each distributed node,and then send it to the central node to aggregate and run the Boosting algorithm.The representative subset needs to meet two conditions.First,it is composed of samples which are difficult to classify in this node.Second,the trained learner has the smallest classification error in the remaining data set of this node.Thirdly,with the help of the Spark distributed computing platform,five Alibaba cloud ECS servers are used for experiments.Three parameters of the representative subset Boosting algorithm and two running parameters of the Spark submission job in this environment are tested and optimized.Through experiments,the standard single-machine Boosting algorithm,DistBoost,PreWeak,and the representative subset Boosting algorithm in this paper are compared on four datasets.The performance comparison of model training time and model prediction accuracy on the training data set verifies that the representative subset Boosting algorithm can stably approach the accuracy of the single-machine Boosting algorithm,while effectively reducing the training time of the model.Fourthly,combining the idea of representative subset Boosting algorithm with face detection method based AdaBoost algorithm,the training process of the classifier of this algorithm is extended to the distributed environment.The experimental results show that the training time of face detection classifier model is significantly lower than that of single machine training.
Keywords/Search Tags:Boosting, Distributed Computing, Spark, Classification, Face Detection
PDF Full Text Request
Related items