Font Size: a A A

Research On Collaborative Training Algorithm Based On Noise Filtering

Posted on:2016-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:X T ZouFull Text:PDF
GTID:2308330461468118Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Semi-supervised learning makes use of a few expensive labeled samples and lots of cheap unlabeled samples to build and strengthen classifiers, it is one of the most important machine learning algorithms. In recent years, semi-supervised learning gradually becomes one of the hottest research topics in machine learning domain.Among the large number of semi-supervised learning methods, collaborative training is one of the sub-branches of semi-supervised learning paradigm which has made great progress. Collaborative training usually trains at least two classifiers on initial labeled sample set. Then, one classifier is assigned as the main classifier and the other classifiers are regarded as auxiliary classifiers in turn. Auxiliary classifiers label unlabeled samples and provide labels with very high labeling confidence for the main classifier, then the main classifier retrains itself on the updated labeled sample set. Because of the comprehensive utilization of multiple views on similar samples and the prediction result from multiple classifiers, collaborative training can usually get a very high classification accuracy. However, in most collaborative training algorithms, especially in the initial stage, due to the small number of labeled samples, the initial accuracy of basic classifiers trained from these samples in collaborative training algorithms is usually very low, which may get incorrect labels for unlabeled samples and bring in noise labels for the later procedure, and therefore, deteriorate the accuracy of the co-training.To solve these problems, in this paper, inspired by sample distribution information and sample selection methods in active learning, we define strategies based on sample representativeness and sample informativeness, and cooperate them into traditional co-training algorithms, aiming to boost the efficiency and classification accuracy of co-training algorithms. The main work in this paper includes the following two aspects:(1)We have proposed a co-training algorithm based on sample representativeness, EnCoTrain in short. To alleviate the introduction of noise data, we define a sample representativeness measure, based on this measure, we propose a new co-training algorithm with the function of noise filtering. Specifically, in each iteration of collaborative training procedure, the sample representativeness of unlabeled samples which have been labeled by auxiliary classifiers with agreement is computed. Then, some of these newly labeled samples with the most sample representativeness are used to retrain the main classifier. In order to verify the performance of the proposed method, we compare EnCoTrain with the original co-training, tri-training, co-forest and so on. The experiments on UCI datasets prove that EnCoTrain can effectively enhance the accuracy of these co-training algorithms.(2) Furthermore, we have introduced a boosting Co-Training algorithm by selecting informative and representative samples, we name it Boost-CoTrain. In this section, inspired by the sample selection strategy in active learning, we work out a useful function to measure the labeling uncertainty of unlabeled samples. In detail, in each round of co-training, some of the unlabeled samples with the most labeling uncertainty will be taken to auxiliary classifiers for labeling, and then these newly labeled samples are used to retrain the main classifier. In experiments, we compare Boost-CoTrain with four algorithms: original co-training, Boost-CoTrain with sample informativeness alone, Boost-CoTrain with sample representativeness alone, Boost-CoTrain without weight. The experimental results show that the performance of collaborative training has been effectively improved by Boost-CoTrain algorithm.
Keywords/Search Tags:Semi-supervised Learning, Collaborative Training, Noise Filtering, Uncertainty Sampling, Sample Representativeness and Informativeness
PDF Full Text Request
Related items