Font Size: a A A

The Study Of Ensemble Evolve Algorithm Base On Sampling

Posted on:2016-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:W Z SongFull Text:PDF
GTID:2308330464470743Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Ensemble learning is a new method of machine learning. It employs multiple weak learners to resolve the same problem, remarkably improving the system’s generalization ability. Thus studying and developing ensemble learning has become a trend from the 1990’s. Through deep research by a number of scholars, ensemble learning has been applied successfully to the fields of image processing, Web information mining and bio feature recognition etc.The two algorithm families in ensemble learning are Boosting and Bagging, whose current major drawbacks include lack of training samples and ensemble size too large. The following are the main works contained in this paper.1 、An Ensemble Evolve Classification Algorithm for Controlling the Size of Final Model (ECSM) is proposed. The well known AdaBoost algorithm in Boosting algorithm family adopts iterative mechanism, in which every round would produce a relatively low accurate weak learner and update the weight of training sample, getting the weight of the correctly classified sample reduced and that of the incorrectly classified one grown. The final model could then be integrated via weight voting. Such algorithm would produce a large quantity of weak learners through multiple iterations, which leads to overlarge model size and poor expression, thus lacks good interpretability. Targeting at this issue the current paper introduces the genetic algorithm into AdaBoost and proposes the ECSM algorithm. ECSM algorithm would not accumulate the values but to start from the weak learner, through genetic manipulation and evaluation function to find out the optimal weak learner in every round. Hence the size of the final model being overlarge is resolved. This algorithm has been compared with the traditional algorithm in experiments and it verifies that the ECSM algorithm’s model size is smaller than that of AdaBoost in the premise of maintaining the accuracy of classification.2、Came up with the Ensemble Evolve Classification Algorithm based on Consistency Sampling (EECS). The ECSM algorithm can reduce the model size in ensuring the accuracy. However, it is weaker in dealing with large sample data and needs to conduct universal search in every round, which results in the time of establishing the model is too long. In terms of the above issue,the current paper designed a computational formula based on classification result consistency. By adopting this formula the consistency value of the current round’s classification result and that of the previous rounds could be calculated. Then we can use this value to update the sampling probability of the training sample, so as to enable the algorithm to deal with large sample data and set up the ensemble model faster. The algorithm has been tested in using 5 UCI data sets under Weka and it turns out that this algorithm obviously excels the ECSM algorithm regarding time efficiency.
Keywords/Search Tags:ensemble learning, genetic manipulation, consistency, sampling probability
PDF Full Text Request
Related items