Font Size: a A A

Research On Label Noise Based On Ensemble Learning

Posted on:2016-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:C C YuanFull Text:PDF
GTID:2348330488455682Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Ensemble learning, also called multiple classifier systems, tries to copy with the same problem by training multiple base learners. As a new machine learning paradigm, its motivation is to improve the performance of generalization. Accuracy and performance of classifiers are always subject to the issue of noisy instances. There exist two ways to deal with label noise in the training datasets: label noise-robust algorithms and label noise cleansing algorithms. In this paper, main work is researched belongs to two ways above. The main research work is described as follows:(1) A new label noise-robust ensemble learning method based on condensed nearest neighbors(En-CNN) is proposed. The CNN(condensed nearest neighbors) rule build a subset of training instances that allow classifying all other training instances correctly. Because of the sensitivity of 1 nearest neighbor classifier, the noisy instances would be kept in the subset. Accordingly there will be a clean and useful complementary subset. Usually different complementary subset would be gained by the CNN rule of differently sorted training set. Further, ensemble of different base classifiers learned from different complementary subsets will be used for noisy datasets. Experiments have shown that our methods based on noisy datasets can achieve better classification performance than classic ensemble methods, such as Bagging, Ada Boost and Random Forest, which prove the robustness of our ensemble method on label noise.(2) The fourth chapter shows us an ensemble filtering algorithm based on Random Forest and dataset Partitioning. At first, a novel majority filtering methods based on Random Forest is proposed, in which performance of classifiers are higher and more robust than majority filtering. Then, several dataset composed of potential label noisy samples would be obtained by randomly partitioned training dataset's majority filtering process. In the end, final label noisy samples will be confirmed through majority voting of samples in the potential label noisy datasets. Experiments prove that our methods can detect more label noise samples and remove less truly labeled samples.(3) Due to sensitivity of Ada Boost to label noise, we propose a Majority Filtering-Ada Boost algorithm· to restrain the excessive increase of label noisy samples. Every sample in the training dataset is labeled with a confidence by Majority Filtering.Then the original instance weight-updating method is revised by confidence to restrain the excessive increase of noisy instances. Experiments showed better classification performance of proposed methods than other classic algorithms.
Keywords/Search Tags:Label noise, Ensemble learning, base classifiers, AdaBoost, Random Forest, Voting Filtering
PDF Full Text Request
Related items