Font Size: a A A

Fine-grained Image Classification In Zero-shot Learning

Posted on:2020-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:J WeiFull Text:PDF
GTID:2428330596995441Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the emergence of large-scale image data and the refinement of classification tasks,convolutional neural network has been developed rapidly to satisfy the increasingly complex task of visual classification.However,the existing image classification and recognition algorithms cannot identify samples which categories not exist in the training data during the testing stage.Therefore,the concept of Zero-Shot learning is proposed to solve the problem of identifying unseen class samples.To solve the current Zero-Shot learning problem,the condition of category missing can be remedied mainly by introducing auxiliary semantic space and establishing extensive mapping relationship between vision and semantic.However,some of the current work does not take the characteristics of vision and semantic into account,ignoring the huge differences between visual space and semantic space.In addition,most of the existing work uses all visual features and semantics to map.These mapping methods make a lot of unnecessary mapping relationships between semantic embedding and visual feature,and reduce the classification accuracy of Zero-Shot learning.Based on the basic bilinear mapping model of Zero-Shot learning,we propose two different improvement methods in this paper.By using the non-linear mapping of lowdimensional embedded visual features and multi-part attention mechanism,the reliability of mapping between semantic and vision is improved.In the basic model of bilinear mapping,Firstly,the deep convolution features of images are obtained by pre-trained convolution neural network,then the semantic vectors are obtained by manual annotation or unsupervised learning,and finally the bilinear mapping functions are learned between visual features and semantic vectors.In the method based on visual low-dimensional embedding in this paper,the pretrained convolution model is used to obtain the feature vectors of the image,then the dictionary learning is used to embedding the feature vectors in low dimensions to reduce redundant information in feature vectors.Finally,several mapping functions are learned between the visual low-dimensional embedding and the semantic vectors,which makes the mapping nonlinear.In the proposed method based on multi-part attention,we use object detection or manual annotation to obtain multiple parts of the image for reducing background information,and obtain the visual feature of multiple parts of the image through the pre-trained deep model.Then,we utilize semantic vectors to obtain the weight of multi-part visual features with attention mechanism which reflect different visual regions have various effect in classification.By theoretical explanation and experimental comparison,it is proved that the two methods proposed in this paper can improve the classification accuracy of Zero-Shot learning.In the visual low-dimensional embedding method,our classification accuracy on most datasets exceeds the three existing methods.In the multi-part attention method,we compare the same fine-grained data set with several state-of-the-art,which shows that our method significantly improves the classification accuracy on the fine-grained dataset.
Keywords/Search Tags:Zero-Shot learning, Fine-grained Image Classification, Attention Mechanism, Visual Low-dimensional Embedding, Multi-part Object Detection
PDF Full Text Request
Related items