Research On Efficient And Compact Convolution Features

Posted on:2020-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:H Zhang

Full Text:PDF

GTID:2428330575954957

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Deep learning adopts a stack of multiple non-linear transformations to extract high-level representation from data.Benefiting from massive training data and great computation power,deep convolutional neural networks become the state-of-the-art:methods for several computer vision tasks.Despite the excellent performance deep neuron networks have achieved,they demand huge resource cost.In resource con?strained environment,these "cumbersome" models are hard to be deployed in many real-world tasks.The core of convolutional neural networks are stacked convolution layers for feature extraction.Therefore,designing efficient and compact convolution feature has a great scientific research significance and real-world application demand.Convolution feature is a distributed representation,which means that there is a "many-to-many" mapping between channels and semantic concepts.Based on the distributed representation nature of convolution feature,this dissertation aims at designing efficient and compact convolution feature.The main contributions are summarized as follows.1.Efficient convolotion feature extraction based on pre-trained models-The re?quirement of labeled training data demands a large amount of cost.It is a recent research focus to extract convolution feature representation based on deep models pre-trained on a large-scale dataset,without resorting to image labels.Current ap-proaches typically generate image representations by global pooling within each channel of the convolutional results independently.However,since convolu-tional results have distributed concepts spreading among channels,these meth-ods cannot sufficiently excavate more valuable/underlying cues from pre-trained models.In this dissertation,we proposed Gram aggregation to create powerful image representations based on both within-and between-channel statistics from convolutional results.By using unsupervised image retrieval as an example,ex-perimental results show that between-channel correlations play a crucial role for retrieval performance.2.Efficient convolution feature sub-sampling with less information loss.The mainstream convolutional feature sub-sampling method is to use pooling layers or stride 2 convolutional layers.However,the "winner-take-all" strategy is used in max-pooling,in which only the neuron with maximum activation is selected as output.In backward pass,only this neuron receives the back-propagated gradient.Stride 2 convolution suffers from information loss as well.Consider the fact that semantic information is spread among multiple neurons,in this dissertation,we proposed ensemble max-pooling.In each pooling region,ensemble max-pooling drops the neuron with maximum activation with probability p,and outputs the second largest neuron.Ensemble max-pooling can be viewed as an ensemble of many base networks.Experiments show that ensemble max-pooling achieves better results than other pooling approaches.Besides,by replacing each stride 2 convolution layer by a tandem form of an ensemble pooling layer and a stride 1 convolution layer,it is shown to deliver significant gains in performance.3.Compact convolution feature design without information blocking.Group convolution layers are commonly used in compact convolution feature design.However,channels in classical group convolution are independent,and no infor-mation communication happen among groups.The distributed representation of the convolution feature indicates that all channels participate in the representa-tion of a semantic concept.Group convolution will face a severe "information blocking".In this dissertation,we proposed GCOS(Group Convolution with Shuffling),which uses 1 � 1 convolution to shuffle the information of different groups.Combined with ThiNet,a state-of-the-art filter level pruning method,convolution feature can be more compact.We show that the original VGG-16 model(491.29 MB)can be compressed into a very small model(ThiNet-Tiny)with only 2.66 MB model size,but still preserve AlexNet accuracy.

Keywords/Search Tags:

deep learning, convolution neuron networks, feature representation, feature sub-sampling, visual recognition

PDF Full Text Request

Related items

1	Research On Deep Learning Based Visual Recognition With Limited Label Resources
2	Research On Video Action Recognition Method Based On Visual Representation And Deep Neural Networks
3	Research On TFT-LCD Mura Defect Recognition Based On Deep Learning
4	SAR Image Classification Based On Deep Feature Learning And Sparse Representation
5	Research On Application Of Deep Learning Models For Feature Representation And Classification
6	Multi-level Feature Extraction For Image Representation And Its Application
7	Learning Visual Representations With Deep Neural Networks
8	Research On Visual Place Recognition Technology Based On Convolution Neural Network Feature
9	Research On Driving Behavior Recognition Algorithm Based On Deep Learning
10	Research On Key Technologies Of Identity Recognition Based On Deep Convolution Features