Font Size: a A A

Research On Few-shot Learning And Model Light-weighting In Image Recognition

Posted on:2022-03-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:1488306323962529Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since 2012,deep neural networks have achieved excellent performance in many sub-fields,such as computer vision,speech recognition,natural language processing,smart medical care and etc.This is accidental and also inevitable.In essence,there are three foundations,i.e.,full-labeled big data,deep neural networks(large amount of parameters),and GPU(high-performance computing).However,the heavy dependence on these three foundations also limits the further development and popularization of artificial intelligence.First,deep neural networks must be trained with a large amount of fully labeled data,otherwise the accuracy of the model will be greatly declined.However,in many new real-world scenarios,it is difficult to obtain samples and the labeling cost is very high.Few-shot learning method studies how to quickly learn new categories from a small number of labeled samples.Second,deep learning models must be trained and tested on the hardware platform with sufficient storage space and excellent computing performance.But,for many computationally limited platforms,such as edge devices(e.g.,smartphones,smart watches,self-driving cars,etc.),offline monitoring equipment(e.g.,public cameras,car driving recorders,etc.),existing well-trained large models will be unacceptable and will not be able to land smoothly.Therefore,to face real needs of many real-world image recognition applications,in this paper,we try to solve two problems,i.e.,few-shot learning and model light-weighting.First,for the general few-shot tasks,we study the metric learning model,feature extraction method and how to use unlabeled samples in few-shot learning.Benefit from the study of few-shot learning,this paper further studies the few-shot learning problem in specific fine-grained image tasks,that is,the one view problem in person re-identification.Then,this paper studies the model lightweight problem in intelligent vehicle applications,and we select a representative application,i.e.driving behavior analysis.Person re-identification and driving behavior analysis are two typical applications in public safety and traffic safety,in which person re-identification focuses on an outdoor open scene,while driving behavior analysis studies a closed scene in the vehicle.In summary,this paper aims to reduce the popularization cost of image recognition applications and break through the scene limitations of artificial intelligence,so that it can bring more benefits to more people in a wider range of fields.The main contributions of this paper are summarized as follows:Firstly,To solve the few-shot problem,a new local and global measure is designed from the perspective of metric learning.Local similarity takes into account not only the pair-wise relationship between the support and query sample,but also the relationship between the support samples and between the query samples.Then a novel global similarity is proposed to make full use of the local information of each task.The experimental results on handwritten dataset Omniglot,the general image recognition dataset minilmageNet,tieredImageNet show that the combination of local and global similarity is able to improve the few-shot learning performance.Secondly,a new contextual similarity based multi-level second-order attention model is proposed to solve semi-supervised few-shot learning problem from the per-spective of data.To improve the representation power of the backbone network,we propose to extract the second-order attention features without increasing the parameters.To take full advantage of the positive samples in the unlabeled set,a new context-based sim-ilarity is proposed.The experimental results on the handwritten dataset Omniglot,the general image recognition dataset minilmageNet,tieredImageNet and the fine-grained image recognition dataset CUB prove the superiority of the algorithm.Thirdly,from an application point of view,we shift the focus of research from general problems to specific tasks,and study solutions to few-shot problems in cross-camera scenarios in person re-identification.Specifically,we propose a new one-view learning method(OVL)inspired by semi-supervised few-shot learning.OVL only requires pretty cheap annotation cost:labeled training images are only provided from one camera view(source view/domain),while the annotations of training images from other camera views(target views/domains)are not available.An adversarial multi-view learning(AMVL)module is proposed which learns a multi-view discriminator by adversarial learning to align the feature distributions between all views.An adversarial unknown rejection learning(AURL)module is designed to reject unknown samples from target views through adversarial learning.Experimental results on the three data sets,Market-1501,DukeMTMC-reID and MSMT17 show that our OVL is superior compared with the existing Domain Adaptation method and semi-supervised method.Lastly,aiming at the problem of model light-weighting,we study driving behavior analysis.A new instance-specific multi-teacher knowledge distillation method(IsMt-KD)is proposed for learning a lightweight convolutional neural network with high accuracy and fast speed.The experimental results on the AUC and StateFarm datasets show that the lightweight model distilled by the proposed IsMt-KD could achieve comparable accuracy compared with the large teacher networks while maintaining the advantage of speedy and light-weight.Overall,in this paper,we study the real-world image recognition problem including few-shot learning and model light-weighting.Extensive experiments on many image recognition datasets show the effective of the proposed methods.The proposed methods in this paper give novel resolution thoughts for real-world image recognition scenarios,and also provide research reference for the application of open scene and closed scene in image recognition task.
Keywords/Search Tags:Image recognition, Metric learning, Transductive few-shot learning, Semi-supervised few-shot learning, Person re-identification, model light-weighting, Knowledge distillation
PDF Full Text Request
Related items