Font Size: a A A

Confusion Graph Based Fault Analysis Of Deep Learning Models And Label Cleaning Of Large Scale Image Datasets

Posted on:2018-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:R C JinFull Text:PDF
GTID:2428330623450719Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Though the deep learning technology in image classification has developed rapidly with significant experimental results,researchers still face the following three major challenges.First,without solid theoretical basis,it is quite difficult to perform fault tolerance analysis for deep learning models.Second,state-of-the-art models usually have complex structures,which exacerbates the difficulty of model design and optimization.Third,the training of deep learning models heavily depends on large scale datasets with annotations.However,mislabeled images are unavoidable and it is quite expensive to obtain large datasets with high quality annotations.In view of the above three challenges,in this paper,we propose a ”confusion graph” model based on undirected graphs,which is used to accurately quantify the deep model's visual confusion between different image categories.By applying the community detection algorithm to the confusion graph,we extract the community structure inside the graph,which helps researchers to analyse the weakness of deep models,understand their classification failures and get prepared for future fault tolerance analysis.The extensive confusion-graph-based analysis of the the outstanding models in the ILSVRC image classification challenge over the years shows that our analytical method is effective.In addition,with the assistance of the confusion community,we propose the ”Expert Sub-net” structure to help the original classification model improve its accuracy.With Expert Sub-net added for fine-grained classification,we lower the Top1 error rate of the AlexNet by 1.49% and reduce that of the vgg-verydeep-16 by 3.45%.Last but not least,in order to automatically identify mislabeled images in general classification datasets and face datasets,we combine the confusion community information with the deep model's outputs and introduce the community detection algorithm to data cleaning.We removed mislabeled images from the large-scale MS-Celeb-1M face image dataset with our method(containing approximate 10 million images)and obtained a face dataset called C-MS-Celeb with high quality annotations(including 6,464,018 images of 94,682 stars).By training a single-net model using our C-MS-Celeb dataset,without fine-tuning,we achieve 99.67% at Equal Error Rate on the LFW face recognition benchmark,which is comparable to other state-of-the-art results.This demonstrates the data cleaning's positive effects on the model training.
Keywords/Search Tags:Image Classification, Visualization Analysis, Deep Learning, Data Annotation Cleaning
PDF Full Text Request
Related items