Font Size: a A A

Research On Class Noise Detection Algorithm Based On Ensemble Learning

Posted on:2019-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:H Q WeiFull Text:PDF
GTID:2428330596950398Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Classification is a major topic in data mining field.Usually,the basic process of classification problem is to train a classification model based on large number of labeled data instances,then predict the unknown data based on this model.In this process,however,there are two factors that affect the classification accuracy,which are the quality of the classification algorithms and the training sets,respectively.When classification algorithm is given,the quality of training set is the only factor that affects the performance of classification models.The quality of the training set is influenced by two factors: the noise data and the number of labeled instances.While the noise data can be divided into attribute noise and class noise,and it has shown that the elimination of attribute noise will decrease the classification accuracy and the elimination of class noise will in opposite improve the classification accuracy.The effect of the number of labeled instances on the classification accuracy is that the generalization error of the classification model will increase with the decrease of the number of labeled instances.In reality,the number of labeled instances is far less than the number of unlabeled instances,and there are so much class noise instances among them.But most exsiting research is based on the class noise in the classification problem or based on the classification problem with a small number of labeled data.Few of the research is based on the few labeled instances with class noise among them in classification.In this thesis,algorithms are designed to improve the classification accuracy when few labeled instances with class noise included is available.What's more,the classifiers ensemble is more accurate than any single classifier in most case.The specific research work is described as follows:(1)A noise detection algorithm based on ensemble learning and semi-supervised learning.The main work is based on semi-supervised learning to expand the labeled dataset,and then use a variety of methods to generate multiple base classifiers,preparing for classifiers ensemble.The main frame of algorithm apply multiple voting method to filter class noise thoroughly.And the use of soft voting in the process of each layer of voting,compared with the general methods,the final pure training set is of higher reliability.(2)A noise detection algorithm based on ensemble learning and active learning.The main work is to select and label the unlabeled data which is of high information density by active learning to expand the scale of labeled dataset in order to achieve high class noise detection accuracy with minimum label cost.This algorithm also carries on the analysis to the class noise set to avoid the error delete of correct data.At the same time,the whole algorithm is carried out in an iterative way,which can filter out the class noise data more thoroughly.
Keywords/Search Tags:class noise, ensemble learning, active learning, semi-supervised learning, soft voting
PDF Full Text Request
Related items