Font Size: a A A

Research On Image Classification Based On Second-Order Structural Layer Embedding Convolution Network

Posted on:2020-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:S Y GeFull Text:PDF
GTID:2428330590497171Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As the mainstream model in the field of computer vision,convolutional neural networks(CNNs)have been widely applied in image classification,image segmentation super-resolution as well as other fields.CNNs usually use a fully-connected layer or global average pooling layer to aggregate the feature maps outputted from convolutional layer.However,the rich information contained in the feature maps cannot be fully modeled by these layers.In order to further enhance the discrimination of the network,it is a novel idea to aggregate feature maps by designing and embedding a structural layer with stronger modeling ability into CNNs.Among them,the second-order pooling methods have attracted more and more attention from scholars,because of they can capture complex relations.This paper focuses on how to design and embed the second-order pooling structural layer into convolutional network,and proposes three kinds of convolution networks based on the second-order pooling to improve the existing methods.The B-CNN models the second-order statistical information of the distribution of features,by calculating the outer product of the feature map.However,the outer product operation can only capture the linear relationship between the channels in the feature map,and failing model the nonlinear relationship between the channels.Therefore it cannot fully utilize the information contained in the feature map.To solve this problem,we propose three kinds of kernelized bilinear pooling and insert respectively them into CNNs.The presented kernelized bilinear pooling uses kernel functions to model the nonlinear relationship between channels,making full use of the effective information in the feature map to obtain more discriminative image representation.Besides,the gradients with respect to kernelized bilinear pooling are completely given,making kernelized bilinear pooling embedded into the network for end-to-end training.The attention mechanism can effectively perform feature recalibration and enhance the expressive ability of the network.This dissertation proposes a channel attention module based on second-order pooling.The module computes the correlation between any two channels and adaptively recalibrates every single channel by weighted sum of all channels.The generated feature map boosts the feature representation,greatly improving the classification performance of the network.Different from the channel attention module in SENet,which only uses first-order information,the presented method more effectively models the inter-channel relationship.At the same time,compared with the non-local convolutional network,which utilizes the correlation between features as weights,the attention module proposed in this paper is more suitable for image classification.The image representation dimensions of second-order modeling in CNNs usually is of high.In addition,such methods only model the global second-order statistical information of feature distributions,ignoring the local distribution information of features.Inspired by the NetVLAD,this paper proposes the local second-order pooling network(LSOP)based on feature space.The LSOP employs visual words to divide the distribution of feature space,and performs second-order pooling inside each cluster space to describe the local second-order statistical information of the feature distribution in detail.Further,in order to overcome the high-dimensional representations caused by second-order pooling,LSOP adopts the tensor sketch to compact them,which significantly reduces the dimension of image representation.The proposed three methods are widely evaluated on many benchmarks.The experimental results show that our methods are superior to their counterparts.Specially,the kernelized bilinear convolution network achieves leading performance.The LSOP network not only harvests competitive performance,but also has a far lower dimension of image representation than related works.
Keywords/Search Tags:Image Classification, Second-Order Pooling, Convolutional Neural Networks, End-To-End Learning
PDF Full Text Request
Related items