Research On Label Noise Based On Ensemble Learning

Posted on:2016-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:C C Yuan

Full Text:PDF

GTID:2348330488455682

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

Ensemble learning, also called multiple classifier systems, tries to copy with the same problem by training multiple base learners. As a new machine learning paradigm, its motivation is to improve the performance of generalization. Accuracy and performance of classifiers are always subject to the issue of noisy instances. There exist two ways to deal with label noise in the training datasets: label noise-robust algorithms and label noise cleansing algorithms. In this paper, main work is researched belongs to two ways above. The main research work is described as follows:(1) A new label noise-robust ensemble learning method based on condensed nearest neighbors(En-CNN) is proposed. The CNN(condensed nearest neighbors) rule build a subset of training instances that allow classifying all other training instances correctly. Because of the sensitivity of 1 nearest neighbor classifier, the noisy instances would be kept in the subset. Accordingly there will be a clean and useful complementary subset. Usually different complementary subset would be gained by the CNN rule of differently sorted training set. Further, ensemble of different base classifiers learned from different complementary subsets will be used for noisy datasets. Experiments have shown that our methods based on noisy datasets can achieve better classification performance than classic ensemble methods, such as Bagging, Ada Boost and Random Forest, which prove the robustness of our ensemble method on label noise.(2) The fourth chapter shows us an ensemble filtering algorithm based on Random Forest and dataset Partitioning. At first, a novel majority filtering methods based on Random Forest is proposed, in which performance of classifiers are higher and more robust than majority filtering. Then, several dataset composed of potential label noisy samples would be obtained by randomly partitioned training dataset's majority filtering process. In the end, final label noisy samples will be confirmed through majority voting of samples in the potential label noisy datasets. Experiments prove that our methods can detect more label noise samples and remove less truly labeled samples.(3) Due to sensitivity of Ada Boost to label noise, we propose a Majority Filtering-Ada Boost algorithm� to restrain the excessive increase of label noisy samples. Every sample in the training dataset is labeled with a confidence by Majority Filtering.Then the original instance weight-updating method is revised by confidence to restrain the excessive increase of noisy instances. Experiments showed better classification performance of proposed methods than other classic algorithms.

Keywords/Search Tags:

Label noise, Ensemble learning, base classifiers, AdaBoost, Random Forest, Voting Filtering

PDF Full Text Request

Related items

1	Research On Random Forest Formulti-label Classification
2	Research On Efficient And Robust SVM Based On CRF
3	Prediction Of Road Traffic Concentration Using Random Forest Algorithm Based On Feature Compatibility
4	A Study Of Label Noise Based On Ensemble Learning In The Classification Of PolSAR Images
5	Study On The Application Of Causative Attacks In Spam Filtering
6	Research On Adaptive Sampling Methods Based On Label Noise Filtering
7	Research On English Text Classification Algorithm Based On Ensemble Learning
8	Research On Class Noise Detection Algorithm Based On Ensemble Learning
9	Symbiotic Forest:A Lightweight Decision Tree Ensemble Method
10	A Research Of Multi-Label Learning Method Based On Deep Forest