Font Size: a A A

Image Data Cleaning And Feature Learning In The Presence Of Label Noise

Posted on:2020-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:W N ZhangFull Text:PDF
GTID:2428330590472661Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Under the background that big data technologies such as machine learning and deep learning are widely used in many application fields,a large data set with reliable label is the basic and necessary condition for the supervised learning task.However,data collected in practical application scenarios often have a certain degree of label noise,that is,the label of data is wrongly labeled.Aiming at the problem of label noise of image data in the field of computer vision,this paper conducts research on image data cleaning and robust feature learning methods,and the specific work and innovation points are as follows:(1)Related concepts of label noise and various common processing techniques are introduced and analyzed in detail,and the basic theoretical methods of deep autoencoder network and generative adversarial networks are introduced.(2)A data cleaning model based on anomaly detection technology and reconstruction error minimization is proposed to reduce the prediction performance due to high false positive rate caused by the traditional label noise detection method.The candidate label noise data are obtained by using the anomaly detection technique,and the true label noise data is further selected according to the reconstruction error minimization criterion.(3)In view of the influence of label noise on feature learning,a robust class-specific autoencoder based feature learning framework is proposed.Specifically,the framework contains three modules,which are respectively the data augmentation strategy based on generative adversarial networks,the optimization strategy based on importance weighting and the iteration strategy based on the minimum reconstruction error.A large number of verification experiments show that all three strategies can reduce the influence of label noise on feature learning to some extent.(4)In the MNIST handwritten digital dataset and caltech-10 image dataset,the proposed correlation models were compared and analyzed with the state-of-the-art data cleaning model and the label noise robust model.Moreover,the proposed models were validated by the data cleaning task on the training set and the classification task on the test set,respectively.
Keywords/Search Tags:Label noise, Data cleaning, Robust feature learning, Deep autoencoder, Reconstruction error
PDF Full Text Request
Related items