Confusion Graph Based Fault Analysis Of Deep Learning Models And Label Cleaning Of Large Scale Image Datasets

Posted on:2018-02-14

Degree:Master

Type:Thesis

Country:China

Candidate:R C Jin

Full Text:PDF

GTID:2428330623450719

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Though the deep learning technology in image classification has developed rapidly with significant experimental results,researchers still face the following three major challenges.First,without solid theoretical basis,it is quite difficult to perform fault tolerance analysis for deep learning models.Second,state-of-the-art models usually have complex structures,which exacerbates the difficulty of model design and optimization.Third,the training of deep learning models heavily depends on large scale datasets with annotations.However,mislabeled images are unavoidable and it is quite expensive to obtain large datasets with high quality annotations.In view of the above three challenges,in this paper,we propose a ”confusion graph” model based on undirected graphs,which is used to accurately quantify the deep model's visual confusion between different image categories.By applying the community detection algorithm to the confusion graph,we extract the community structure inside the graph,which helps researchers to analyse the weakness of deep models,understand their classification failures and get prepared for future fault tolerance analysis.The extensive confusion-graph-based analysis of the the outstanding models in the ILSVRC image classification challenge over the years shows that our analytical method is effective.In addition,with the assistance of the confusion community,we propose the ”Expert Sub-net” structure to help the original classification model improve its accuracy.With Expert Sub-net added for fine-grained classification,we lower the Top1 error rate of the AlexNet by 1.49% and reduce that of the vgg-verydeep-16 by 3.45%.Last but not least,in order to automatically identify mislabeled images in general classification datasets and face datasets,we combine the confusion community information with the deep model's outputs and introduce the community detection algorithm to data cleaning.We removed mislabeled images from the large-scale MS-Celeb-1M face image dataset with our method(containing approximate 10 million images)and obtained a face dataset called C-MS-Celeb with high quality annotations(including 6,464,018 images of 94,682 stars).By training a single-net model using our C-MS-Celeb dataset,without fine-tuning,we achieve 99.67% at Equal Error Rate on the LFW face recognition benchmark,which is comparable to other state-of-the-art results.This demonstrates the data cleaning's positive effects on the model training.

Keywords/Search Tags:

Image Classification, Visualization Analysis, Deep Learning, Data Annotation Cleaning

PDF Full Text Request

Related items

1	Image Analysis And Annotation Based On Deep Learning
2	Research On Cleaning Image Data Based On Deep Learning
3	Research On Image Annotation Method Based On Multimodal Deep Kernel Learning
4	The Design And Implementation Of Medical Image Annotation System For Deep Learning
5	Research On Key Issues Of Image Classification And Annotation By Fusing Text Information
6	Design And Realization Of Patent Data Cleaning And Visualization Module
7	Research On Image Classification And Annotation Based On Deep Learning
8	Automatic Image Annotation Based On Deep Learning With Robust Strategies
9	Research Of Large-Scale Web Image Annotation And Interpretation
10	Research Of Large Scale Automatic Image Annotation Based On Deep Learning