Font Size: a A A

Integrated Classifier Learning Algorithm

Posted on:2012-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhangFull Text:PDF
GTID:2208330332989724Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Classification is an important task in data mining. Firstly, the labeled data are used to make a model, then the model is used to classify the unlabeled data. Accuracy is an important indicator of classification performance, however, it is hard to achieve high accuracy by one learner, the emergence of ensemble learning can gain better accuracy. Classification ensemble is a pattern which solves a learning task using more than one classifier, and can achieve better performance than a single classifier. Ensemble learning become a hot field direction in machine learning in recent years, as it has excellent characteristics, it has been widely used in many fields, such as planetary exploration, character recognition, biometric authentication, web information filter and so on.Ensemble learning mainly includes two aspects currently, firstly, the single learner accuracy should not be too low, if the learning precision is too low, the ensemble can not achieve well prediction, Second, the learner should have diversity for each other, if there are no diversity, it would meaningless for integrating. There are many methods for creating diversity, such as feature-based technology, using one strategy to select subset, obtaining diversity through different feature subset, data-based technology, obtaining diversity through sampling data set. However, there are still some unresolved issues in ensemble learning, how to design learners have larger diversity, and the measure of diversity. By deeply understanding the ensemble learning, in this paper we apply the ensemble to classification process, based on the two classical algorithm, AdaBoost and bagging, make some improvements, and get well classification performance and faster convergence speed. Specifically, we carry out and complete the following work.(1)First introduce the classification process, the basic idea of the common classification algorithms currently and point out the advantages and disadvantages, introduce the ensemble concept in brief, analysis the representative algorithm, AdaBoost and Bagging, describe selection ensemble learning briefly, point out the lack of ensemble and the development direction, provide the basis for ensemble.(2)In order to produce higher accuracy, faster convergence ensemble, a new AdaBoost algorithm(MWBoost) which signs the wrongly instances in iteration is proposed.In the promotion process, the misclassified samples are put into the next iteration, meanwhile resample in the correct classified instances, then it can focus on the instances difficult to be classified more quickly. We test the algorithm on some of the UCI datasets, and compare with the traditional algorithm of AdaBoost, experimental results show that the new algorithm has better classification accuracy and faster convergence speed.(3)In order to produce diversity classifiers, we propose a new Bagging ensemble method F-Bagging based on fuzzy cluster. In the algorithm, the training instances are clustered by fuzzy cluster, then accord to membership matrix, if one instance belongs to some clusters have a little difference in membership, the instance will be put into all the clusters which satisfy the condition, the method considers the real distribution of instances, at last each cluster is to be trained. As the data and class label is different from each cluster, that the base learner has larger diversity. The number of subsets decides the number of learners. After the learners are trained, as the instances in the same cluster have high similarity, that it can achieve a classifier that available for classifying such samples, so we weight the learners according to the ratio of the distance between the test instances and the cluster centers. Experimental results show that the new method has better classification performance on pattern recognition.
Keywords/Search Tags:Classification, Ensemble learning, AdaBoost, Bagging, Resample, Fuzzy clustering, Membership
PDF Full Text Request
Related items