Font Size: a A A

Research On Efficient And Compact Convolution Features

Posted on:2020-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2428330575954957Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Deep learning adopts a stack of multiple non-linear transformations to extract high-level representation from data.Benefiting from massive training data and great computation power,deep convolutional neural networks become the state-of-the-art:methods for several computer vision tasks.Despite the excellent performance deep neuron networks have achieved,they demand huge resource cost.In resource con?strained environment,these "cumbersome" models are hard to be deployed in many real-world tasks.The core of convolutional neural networks are stacked convolution layers for feature extraction.Therefore,designing efficient and compact convolution feature has a great scientific research significance and real-world application demand.Convolution feature is a distributed representation,which means that there is a "many-to-many" mapping between channels and semantic concepts.Based on the distributed representation nature of convolution feature,this dissertation aims at designing efficient and compact convolution feature.The main contributions are summarized as follows.1.Efficient convolotion feature extraction based on pre-trained models-The re?quirement of labeled training data demands a large amount of cost.It is a recent research focus to extract convolution feature representation based on deep models pre-trained on a large-scale dataset,without resorting to image labels.Current ap-proaches typically generate image representations by global pooling within each channel of the convolutional results independently.However,since convolu-tional results have distributed concepts spreading among channels,these meth-ods cannot sufficiently excavate more valuable/underlying cues from pre-trained models.In this dissertation,we proposed Gram aggregation to create powerful image representations based on both within-and between-channel statistics from convolutional results.By using unsupervised image retrieval as an example,ex-perimental results show that between-channel correlations play a crucial role for retrieval performance.2.Efficient convolution feature sub-sampling with less information loss.The mainstream convolutional feature sub-sampling method is to use pooling layers or stride 2 convolutional layers.However,the "winner-take-all" strategy is used in max-pooling,in which only the neuron with maximum activation is selected as output.In backward pass,only this neuron receives the back-propagated gradient.Stride 2 convolution suffers from information loss as well.Consider the fact that semantic information is spread among multiple neurons,in this dissertation,we proposed ensemble max-pooling.In each pooling region,ensemble max-pooling drops the neuron with maximum activation with probability p,and outputs the second largest neuron.Ensemble max-pooling can be viewed as an ensemble of many base networks.Experiments show that ensemble max-pooling achieves better results than other pooling approaches.Besides,by replacing each stride 2 convolution layer by a tandem form of an ensemble pooling layer and a stride 1 convolution layer,it is shown to deliver significant gains in performance.3.Compact convolution feature design without information blocking.Group convolution layers are commonly used in compact convolution feature design.However,channels in classical group convolution are independent,and no infor-mation communication happen among groups.The distributed representation of the convolution feature indicates that all channels participate in the representa-tion of a semantic concept.Group convolution will face a severe "information blocking".In this dissertation,we proposed GCOS(Group Convolution with Shuffling),which uses 1 × 1 convolution to shuffle the information of different groups.Combined with ThiNet,a state-of-the-art filter level pruning method,convolution feature can be more compact.We show that the original VGG-16 model(491.29 MB)can be compressed into a very small model(ThiNet-Tiny)with only 2.66 MB model size,but still preserve AlexNet accuracy.
Keywords/Search Tags:deep learning, convolution neuron networks, feature representation, feature sub-sampling, visual recognition
PDF Full Text Request
Related items