Research On Sample Denoising In Semi-Supervised Co-Training Algorithm

Posted on:2022-08-08

Degree:Master

Type:Thesis

Country:China

Candidate:X Gong

Full Text:PDF

GTID:2518306530962469

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Compared with traditional machine learning methods,the advantage of semi-supervised learning is that it can simultaneously utilize scarce labeled samples and massive unlabeled samples to train the model.Semi-supervised learning inherits the advantages of supervised learning and unsupervised learning and avoids the shortcomings of both,meanwhile improves the generalization and accuracy of the model.As an important research direction in semi-supervised learning,the main idea is to train two classifiers based on two fully redundant views,and realizes the classification of unlabeled samples by iterative cooperation between the classifiers.Co-training algorithm has the characteristics of multi-view complementarity,and also has good performance and robustness on the circumstance that training data consists of rare labeled samples and a large of unlabeled samples.So it has been widely studied and applied in many areas.However,the noise problem is still the key to improve the performance of co-training algorithm.In the co-training algorithm,noise comes from many aspects.For example,noise samples exist in the initial training set,which will lead to large errors in the initial stage of the model,and will accumulate and increase as the training process progresses to form a vicious circle.The noise feature in the data is not considered in view segmentation,which will generate a lot of noise and consume large time and memory on high dimensional data.In the data of absence of two fully redundant views,how to effectively segment the view is the most important.That ensures two independent and complete classifiers can be trained to better achieve synergistic effect,meanwhile effectively avoid the problem of noise introduced by the weak classifier in the classification process.The processing of inconsistent samples labeled by two view classifiers is also the key to reducing the noise in the iteration process.In this paper,the problem of sample denoising in semi-supervised co-training is studied.The research work mainly includes the following contents :(1)In the standard co-training algorithm,the inadequate redundancy of view segmentation results in error accumulation of two classifiers and the inconsistency of classification for the same unlabeled samples.To solve the problem,a co-training algorithm combines information gain rate and K-means clustering is proposed.The information gain rate of each feature is calculated according to labeled samples,and features with high information gain rate are evenly divided into two views.That can avoid the over-fitting problem and solve the problem of insufficient redundancy in view segmentation.Then K-means clustering is used to find the cluster of inconsistent samples in each classification process,and the sample is relabeled according to the principle that the samples in the same cluster have the highest similarity..(2)A co-training algorithm based on weighted principal component analysis and improved density peak clustering is proposed.Firstly,this method introduces feature weight coefficients to represent the importance of each feature based on the traditional principal component analysis,and the low-weight feature is treated as noise feature that generating interference information and is deleted subsequently.Then,the key feature are evenly divided into two views in view segmentation,so as to better realize the synergy of the two classifiers.Finally,the improved density peak clustering is used to determine the category of labeled inconsistent samples,which effectively reduces the probability of mislabeled samples forming noise.(3)Aiming at a more flexible and systematic processing mechanism of noise in co-training,a co-training algorithm based on adaptive data density editing is proposed.Firstly,a novel noise filter is created based on data density,which has good recognition effect for boundary noise and outlier samples.Then,the monitoring quantity is set for each unlabeled sample to detect the credibility of the category.That can ensure the sample is correctly labeled as much as possible at the beginning,and also reduce introduction of large amount of noise.Finally,an adaptive editing strategy based on PAC theory and monitoring quantity is proposed,and is integrated into the co-training framework to deal with noise.In each training process,the method can automatically execute noise processing mechanism according to amount and state of noise,and ensure the classification error rate is reduced and the amount of labeled samples is increased.Experimental results on 12 UCI data sets prove the effectiveness of the algorithm.

Keywords/Search Tags:

co-training, noise processing, weighted PCA, adaptive editing, PAC theory

PDF Full Text Request

Related items

1	Research On Lin Suifang’s Editing Theory And Practice
2	Study On Weighted Mean Filtering Algorithm Based On The Images With Salt And Pepper Noise
3	A New Adaptive Weighted Mean Filter For Salt And Pepper Noise
4	The Research On Image Information Processing Methods Under Dynamic Imaging Environment
5	Design And Implement Of Adaptive Training System
6	Study On The Adaptive Processing Of Noise In The Audio Band
7	Research Of Adaptive Noise Cancellation Technology Based On DSP
8	Digital Video Image Processing Based On The Concept Of Adaptive Neighborhood Statistics
9	Three-values-weighted Iterative Image Denoising Algorithm Based On D-S Evidence Theory
10	Research On Master's Course Of Editing And Publishing In Chinese Universities