Font Size: a A A

Convolutional Neural Network Architecture For Image Recognition

Posted on:2020-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2428330620460005Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has been dominating in the field of computer vision.Supported by powerful computational resources like modern GPUs,it is possible to train a hierarchical structure with multiple layers to capture different levels of visual patterns.Deep Convolutional Neural Networks has achieved state-of-the-art accuracy in various computer vision tasks.However,there are some obvious problems in the actual training process by simply increasing the number of network layers.Firstly,model accuracy become stable after huge increase of network depth.Model accuracy can not keep rising with layer number,even decline further.This phenomenon is called model degradation problem.Stacked convolution layer can not increase non-linear terms learned by the model,so it is not beneficial for feature distribution fitting in complex problems.Secondly,the increase of layer numbers also brings more parameters and computation quantities,resulting in training time going up,which is not practical for wide application.It can be seen that there is still much room for convolutional neural network architecture improvement.Inspired by transforming single branch convolutional neural networks to multibranch structures,we propose a novel approach named Second-Order Response Transform(SORT),which appends element-wise product transform to the linear sum of a two-branch network module.This modification brings two-fold benefits.First,SORT facilitates crossbranch information propagation,which rewards consistent responses in forward-propagation,and enables each branch to update its weights based on the current status of the other branch in back-propagation.Second,the nonlinearity of the module becomes stronger,which allows the network to fit more complicated feature distribution.In addition,adding such operations is very cheap,as it requires small extra computation amount when using SORT to modify ResNet-50,which can improve its accuracy by 0.28%(6.17% relatively)when evaluated on ILSVRC2012 Classification.Convolution is spatially symmetric,i.e.,the visual features are independent of its position in the image,which limits its ability to utilize contextual cues for visual recognition.Inspired by Part-Based Model,we introduced the idea of capturing correlation of local features into the construction of the network Model.Considering the semantic relevance of neighbor pixels in local regions,we make each neuron refer to the response of its surrounding points without adding additional supervision information,so as to compute an importance score and coefficient weighted correction.In addition,we introduced a multi-scale approach for scoring,which extracts visual cues from surrounding regions at multiple scales.Our approach is named multiscale spatially asymmetric recalibration(MS-SAR).MS-SAR is implemented in an efficient way,so that only small fractions of extra parameters and computations are required.We apply MSSAR to several popular building blocks,including the residual block and the densely-connected block,and demonstrate its superior performance in ILSVRC2012 classification tasks,which improved ResNeXt-50 by 5.56% relatively by only 2% computation increase.
Keywords/Search Tags:Large-scale Image Classification, Convolutional Neural Networks, Second-Order Response Transform, Multi-scale Pooling, Spatially Asymmetric Recalibration
PDF Full Text Request
Related items