Research On Learning From Image Dataset With Noisy Labels

Posted on:2019-06-01

Degree:Master

Type:Thesis

Country:China

Candidate:X M Qin

Full Text:PDF

GTID:2348330563453931

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of artificial intelligence,image classification technology plays an important role in the daily life,and it is the core technology in the field of face recognition,object localization and medical diagnosis.However,the data which depended on the image classification research are all manually labeled,which is not only expensive,but also time-consuming,which greatly limits the potential of image classification,so it is very important to train the high precision and high robustness model on the low-cost data.On the internet every day produces massive image data,collecting these data is simple,the diversity is strong,and often contains the additional semantic metadata,but these data will include images with noisy labels.Therefore,this thesis aims to study to image classification method with noisy labels,the main work divides into the following three parts:?i?"Data purification" for the original dataset,which will filter out "simple","clean" and representative small sample.First,we design the "positive and negative" word-class features,parse the original text into the word vector,and then use our mixed distance similarity algorithm to calculate the similarity between the word vectors,and then filter out of the "clean" baseline dataset,finally train the baseline model.?ii?Design a simple to complex "curriculum learning" strategy for the webvision dataset.The baseline model is used to extract the features,perform probabilistic prediction,PCA dimensionality reduction,t-SNE visualization,hierarchical cluster analysis,and then we re-divide the dataset to form several subsets of the original data,finally train models using Inception-v3 and Resnet-50 architecture.?iii?Experiments and comparative analysis on the wv-40 dataset show that the Q₁0_denos model trained on the final dataset after "denoising" is 5.9% higher than the target model trained on the original noise-containing dataset,which proves that the algorithm proposed in this paper can learn better representation and better robustness,and it is 2.35% higher than the Q₁0 model without re-dividing the dataset,which verifies that clustering and re-dividing method can improve the accuracy of the model.Furthermore,the Q₁0_denos model is 5% higher than the TF_BL model which finetunes on baseline dataset using the target model,the result shows that the algorithm of this thesis is superior to those predecessors.To analyze feature representation ability of Q₁0_denos model intuitively,this thesis uses the guidance of back propagation to Tench,Bulbul,Terrapin and other pictures to visualize,the results show that the model is very good to learn the contour of the object.In summary,the proposed algorithm for the classification with noisy tag images is very effective,especially for many classes with noisy labels.

Keywords/Search Tags:

deep learning, image classification, data purification, multi-view learning

PDF Full Text Request

Related items

1	Research Of Classification Method On High-Dimensional Image Data
2	Research On Deep Learning Based Multi-view Representation Learning Techniques
3	Study Of Supervised And Semi-supervised Multi-view Feature Learning Methods
4	Research Of View-Learning Based Classification Methods
5	Research On Large Scale Image Classification Based On Multi - View Learning
6	Object Classification Methods Of 3D Radar Images Based On Deep Learning
7	Research On Multi-view Subspace Learning Based On Deep Learning And NMF
8	Research On Multi-view Learning Under Complex Application Situations
9	Research On Hierarchical Robust Multi-view Learning
10	Research And Application Of Deep Learning And Weak Supervision For Multi-Label Image Classification