Font Size: a A A

Research On Deep Learning Based Visual Recognition With Limited Label Resources

Posted on:2019-02-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:B B GaoFull Text:PDF
GTID:1318330545475687Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the arrival of large-scale data and the dramatic enhancement of comput-ing capacity,deep learning technologies represented by convolutional neural networks(CNN)have made breakthroughs in various visual recognition tasks.Existing deep learning algorithms' success hinges on large-scale and accurately labeled training data.However,visual recognition with the limited label resources(insufficiency or uncer-tainty)is universal in practical applications.This has become a new challenge in the computer vision community,but has not been well studied yet.This thesis thus tries to solve label-constrained visual recognition in term of feature representation and feature learning.Its applications mainly include single-label image recognition,multi-label image recognition,scene classification,video classification,facial attributes estima-tion,head pose estimation,semantic segmentation,etc.The main contributions can be summarized as follows:1.A deep spatial pyramid(DSP)framework is proposed,which is a general im-age recognition system using deep features.We explore five important factors for using deep learning features,analyze their impact on feature representations,and provides corresponding decisions:(1)The activation of convolutional(Conv)lay-ers contains more spatial information and less computation cost compared to that of the fully connected(FC)layers.Thus the features of Conv layers are more efficient than that of FC layers;(2)Frobenius norm normalization is more effective than non-normalized or l2 vector normalization;(3)The proposed deep space pyramid can encode spatial information very naturally;(4)Encoding deep descriptors only need small number of Gaussian components in the Fisher Vector,which is completely op-posite to the experience of encoding SIFT descriptors;(5)Using multi-scale deep features can effectively improve the performance of visual recognition system.The proposed DSP framework is a simple,efficient,yet highly accurate image classi-fication system.The effectiveness of DSP is also validated on many benchmark datasets.2.A discriminant distribution distance(D3)representation is proposed,which converts a set of instance vectors into a vector representation.In computer vi-sion,visual entity(image or video)is often represented as a set of descriptors,and it is crucial to design a powerful representation method which encode a set of vectors as a single vector.Existing methods such as FV or VLAD are designed based on a generative perspective,and their performances fluctuate when difference types of instance vectors(e.g.,dense SIFT or deep learning features)are used.The proposed D3 method effectively compares the two sets as two distributions,and proposes a directional total variation distance(DTVD)to measure their dissimilarity.Further-more,a robust classifier-based method is proposed to estimate DTVD robustly,and to efficiently represent these sets.D3 is evaluated in video action and image recog-nition tasks.It achieves excellent robustness,accuracy and speed.3.A deep label distribution learning(DLDL)framework is proposed,which ef-fectively utilizes the label ambiguity in both feature learning and label distri-bution learning.CNN has achieved excellent recognition performance in various visual recognition tasks.A large-scale labeled training set is one of the most im-portant factors for its success.However,it is difficult to collect sufficient training images with precise labels in some domains such as apparent age estimation,head pose estimation,multi-label classification and semantic segmentation.Fortunately,there is ambiguous information among labels,which makes these tasks different from traditional classification.Based on this observation,we convert the label of each image into a discrete label distribution,and learn the label distribution by mini-mizing a Kullback-Leibler divergence between the predicted and ground-truth label distributions using deep CNN.The proposed DLDL method effectively utilizes the label ambiguity and prevents the network from over-fitting even when the training set is insufficient.Experimental results show that the proposed approach produces significantly better results than state-of-the-art methods for age estimation and head pose estimation.At the same time,it also improves recognition performance for multi-label classification and semantic segmentation tasks.4.A deep learning framework is proposed,which jointly learns label distribu-tion and expectation regression.Ranking-CNN and DLDL are the state-of-the-art methods in facial attributes(e.g.,age or attractiveness)estimation tasks.However,these methods have an inconsistency between the training objective and evaluation metric,so they may be suboptimal.In addition,they always adopt image classifi-cation or face recognition models with a large amount of parameters,which bring expensive computation cost and storage overhead.To alleviate these issues,we de-sign a lightweight network architecture and propose a unified framework which can jointly learn label distribution and regress the real-valued label.Furthermore,we explore the relationship between Ranking-CNN and DLDL,and provide the first theoretical analysis to show that the ranking method is in fact learning label distri-bution implicitly.This result thus unifies Ranking-CNN into the DLDL framework.The effectiveness of the proposed method has been demonstrated on age and attrac-tiveness estimation tasks.It achieves new state-of-the-art results using the single model with 36x fewer parameters and 2.6x reduction in inference time.Moreover,it can achieve comparable results as the state-of-the-art even though model param-eters are further reduced to 0.9M(3.8MB disk storage).
Keywords/Search Tags:Feature Representation, Label Distribution, Convolution Neural Network, Deep Learning, Label Constrained, Visual Recognition
PDF Full Text Request
Related items