Font Size: a A A

Research On Attention Mechanism Of Image Classification And Its Application On Object Detection

Posted on:2021-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:B H ChenFull Text:PDF
GTID:2428330614468334Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Attention mechanism has been widely used in the computer vision domain.Among them,the attention mechanism of image classification is proposed for convolutional neural networks,which can improve the representation ability of the model through recalibrating features.In addition,since the mainstream algorithms in other fields,e.g.object detection,utilize the convolutional neural networks to extract image features,the performance gains brought by the attention mechanism of classification can be generalized well to these fields.However,there still exists some shortcomings in some related works of this kind of attention mechanism,such as: some algorithms only use global pooling to generate attention descriptors without considering context information;the amount of parameters and computation overhead may be large;some attention structures are designed without fully consideration of the characteristics of the target model structure.To alleviate the negative effects brought by above problems,this thesis focuses on the structure design of attention mechanism of image classification.The main research contents and innovations are described as follows:1.To solve the shortcomings of only using global pooling to calculate attention descriptors,we propose an attention mechanism of classification based on multi-scale contextual information.This structure utilizes aggregation and distribution sub-modules to generate multi-scale attention descriptors and corresponding weights.Constraints are introduced between these two modules to suppress the negative effect brought by noise context.In addition,depthwise convolution is utilized to explicitly extract contextual information,which brings further performance gains.In the experiments,we verify the performance of the proposed attention mechanism module through the mainstream image classification datasets including CIFAR-100 and Image Net-1K.By visualizing the attention maps,we demonstrate that our module has the capacity of helping model focus on more discriminative part of the features.Moreover,we verify the generalization ability of our module through object detection experiments.In the Image Net-1K image classification task,the classification accuracy of Res Net50 equipped with our module improves 2.30%,exceeding the Res Net101 which has 2 times the amount of network depth.2.Based on the characteristics of efficient networks,we propose a spatial-channel features adaptive attention mechanism of image classification based on the principle of embed-expand.Currently,most efficient networks have a weak ability to extract spatial-wise features and are unbalanced between channel-wise and spatial-wise features.In view of this phenomenon,we first enhance the representation ability of this two kinds of features mentioned above through multidimensional recalibration without any extra parameters.Then fully-connected layers and convolutions layers are utilized to enlarge the receptive fields of attention module which enhance the feature information flow.Finally we achieve the adaptive fusion of channel-enhanced and spatial-enhanced features to keep the balance between them to some extent.This module is designed for efficient networks,including Shuffle Net V2 and Mobile Net V2.Ablation studies are conduct on Image Net-1K datasets and we demonstrate the generalization ability of the module through single-stage object detection experiments.In the Image Net-1K classification task,about 2.37% classification accuracy gains can be obtained for Shuffle Net V2 with only 0.1M additional parameters.3.An object detection system which is embedded with attention mechanism is established.We optimized the light-weighted single-stage object detection algorithm YOLO V3 through three aspects,including structure,training and inference.The structure enhancement consists of receptive field-spatial united attention mechanism,which improves the model's representation ability and robustness towards the scale changes of objects.Training enhancement refers to introducing many tricks to boost model performance without any extra parameters and computation overhead in the training process.Inference enhancement refers to the pipeline optimization of the pre-process,inference and post-process process of images.The optimized system has the better detection performance than YOLO V3 with 18 times fewer parameters and 6 times faster speed,which is finally integrated into ROS system and deployed on NVIDIA TX2 platform.
Keywords/Search Tags:attention mechanism, image classification, object detection, convolution neural network, efficient network
PDF Full Text Request
Related items