Font Size: a A A

Research On Attention Based Image Classification With Deep Learning

Posted on:2019-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:P S WangFull Text:PDF
GTID:2428330542497955Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The task of image classification is to find the corresponding label of a given image from a set of labels.Image classification is one of the most important fields in computer vision,and the foundation of many others.In the recent years,with the spread of deep learning,image classification research has experienced rapid development,resulting in some widely used models.However,some sub-fields including fine-grained image clas-sification and indoor scene classification remains to be highly challenging tasks,with features such as small inter-class difference,large inner-class difference,no prominent main object,etc.Visual attention has been widely used in these fields,with the ability to focus on the selected regions of an image.However,current uses of visual attention in image classification research are limited in several ways:The attention provided is single-channeled or with a very limited number of channels,unable to cover all parts of interest;The attention weights are directly applied to features;The attention is extracted in the form of hard attention,making it hard for end-to-end training.In this paper,we tackle those problems that limit visual attention's effectiveness on fine-grained image classification and indoor scene classification,and propose a set of multi-channel visual attention based end-to-end trainable deep image classification models.Firstly,we propose and implement an image classification model with multi-channel attention extracted from convolution activation outputs.With this model,we propose the multi-channel structure of visual attention.The attention weights are outputs from convolution operations over normalized image features.We also propose a new atten-tion applying method,with the subtraction of mean feature values over each channel of the attention weights to extract high-order information from the image features.The application of attention weights outputs high level features as the image representation,which can be used as input of a classifier.The proposed end-to-end image classifica-tion model suppressed former state-of-the-art methods on several fine-grained image classification and indoor scene classification datasets with a large margin.Secondly,we propose and implement a part detection based visual attention fine-grained image classification model.To further improve our attention model's ability to localize parts of interest,part location labels provided by fine-grained dataset are used to train a fully-convolutional part detection network.The detection network outputs a feature map where each position marks the detection result of the corresponding region in the image.The feature map is also used as the multi-channel attention weight,and is applied to low-level features of the image,forming an end-to-end image classification model.The proposed model achieved better results on fine-grained image classification compared to former methods,with the ability of producing part detection results.Last but not least,we propose a multi-level,multi-scale feature based visual at-tention scene classification model.This model extracts and combines multi-level and multi-scale features under the attention mechanism framework.Multi-level features are extracted from convolution networks pre-trained with different datasets;Multi-scale features are extracted by scaling input image into different sizes.The two features with different levels and scales corresponding respectively to attention weights and the low-level image features are combined by bilinear pooling to get the high-level image repre-sentation.Experiments show that the proposed method achieved better results compared to previous methods on indoor scene classification.
Keywords/Search Tags:Attention mechanism, Fine-grained image classification, Indoor scene classification, Multi-channel Visual attention, Object Detection, Deep learning, Convolutional neural network
PDF Full Text Request
Related items