Research On Attention Mechanism Of Image Classification And Its Application On Object Detection

Posted on:2021-02-03

Degree:Master

Type:Thesis

Country:China

Candidate:B H Chen

Full Text:PDF

GTID:2428330614468334

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Attention mechanism has been widely used in the computer vision domain.Among them,the attention mechanism of image classification is proposed for convolutional neural networks,which can improve the representation ability of the model through recalibrating features.In addition,since the mainstream algorithms in other fields,e.g.object detection,utilize the convolutional neural networks to extract image features,the performance gains brought by the attention mechanism of classification can be generalized well to these fields.However,there still exists some shortcomings in some related works of this kind of attention mechanism,such as: some algorithms only use global pooling to generate attention descriptors without considering context information;the amount of parameters and computation overhead may be large;some attention structures are designed without fully consideration of the characteristics of the target model structure.To alleviate the negative effects brought by above problems,this thesis focuses on the structure design of attention mechanism of image classification.The main research contents and innovations are described as follows:1.To solve the shortcomings of only using global pooling to calculate attention descriptors,we propose an attention mechanism of classification based on multi-scale contextual information.This structure utilizes aggregation and distribution sub-modules to generate multi-scale attention descriptors and corresponding weights.Constraints are introduced between these two modules to suppress the negative effect brought by noise context.In addition,depthwise convolution is utilized to explicitly extract contextual information,which brings further performance gains.In the experiments,we verify the performance of the proposed attention mechanism module through the mainstream image classification datasets including CIFAR-100 and Image Net-1K.By visualizing the attention maps,we demonstrate that our module has the capacity of helping model focus on more discriminative part of the features.Moreover,we verify the generalization ability of our module through object detection experiments.In the Image Net-1K image classification task,the classification accuracy of Res Net50 equipped with our module improves 2.30%,exceeding the Res Net101 which has 2 times the amount of network depth.2.Based on the characteristics of efficient networks,we propose a spatial-channel features adaptive attention mechanism of image classification based on the principle of embed-expand.Currently,most efficient networks have a weak ability to extract spatial-wise features and are unbalanced between channel-wise and spatial-wise features.In view of this phenomenon,we first enhance the representation ability of this two kinds of features mentioned above through multidimensional recalibration without any extra parameters.Then fully-connected layers and convolutions layers are utilized to enlarge the receptive fields of attention module which enhance the feature information flow.Finally we achieve the adaptive fusion of channel-enhanced and spatial-enhanced features to keep the balance between them to some extent.This module is designed for efficient networks,including Shuffle Net V2 and Mobile Net V2.Ablation studies are conduct on Image Net-1K datasets and we demonstrate the generalization ability of the module through single-stage object detection experiments.In the Image Net-1K classification task,about 2.37% classification accuracy gains can be obtained for Shuffle Net V2 with only 0.1M additional parameters.3.An object detection system which is embedded with attention mechanism is established.We optimized the light-weighted single-stage object detection algorithm YOLO V3 through three aspects,including structure,training and inference.The structure enhancement consists of receptive field-spatial united attention mechanism,which improves the model's representation ability and robustness towards the scale changes of objects.Training enhancement refers to introducing many tricks to boost model performance without any extra parameters and computation overhead in the training process.Inference enhancement refers to the pipeline optimization of the pre-process,inference and post-process process of images.The optimized system has the better detection performance than YOLO V3 with 18 times fewer parameters and 6 times faster speed,which is finally integrated into ROS system and deployed on NVIDIA TX2 platform.

Keywords/Search Tags:

attention mechanism, image classification, object detection, convolution neural network, efficient network

PDF Full Text Request

Related items

1	Research On Attention Based Image Classification With Deep Learning
2	Object Classification And Detection Based On Attention Mechanism And Knowledge Distillation
3	Research On The Method Of Image Salient Object Detection Based On Convolutional Neural Network
4	Method Research On Object Detection In Real Scene
5	Multi-scale Features Fusion Network For Salient Object Detection
6	Video Object Detection Based On Adaptive Convolution Network And Visual Attention Mechanism
7	The Research Of Image Classification Methods Based On Convolution Neural Network
8	Efficient And Lightweight Feature Pyramid Network For Object Detection
9	Research On Convolution Neural Network For Clothing Classification
10	Salient Object Detection Based On Global And Local Perception