Font Size: a A A

Research On Compression And Acceleration Of Deep Convolutional Neural Networks

Posted on:2022-06-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q B GuoFull Text:PDF
GTID:1488306725951489Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep convolutional neural networks have made great advances in a variety of computer vision tasks,such as image classification,object recognition,and semantic segmentation.The design of increasing depth greatly improves the recognition performance of deep neural networks,which depends on a large number of parameters and heavy computations.However,for most embedded systems and mobile platforms,owning to limited storage and computing resources,they can barely afford such resource require-ments.This has seriously hindered the extension and application of deep neural networks.The compression and acceleration techniques is a new topic for deep convolutional neural networks,arising from the rapid development of deep learning.A large amount of evidence has proved that deep neural networks have redundant pa-rameters and can be compressed with little or no accuracy loss.In order to realize model compression and acceleration,researchers have proposed a lot of theories and method-s,including network pruning,network quantization,low-rank decomposition,knowledge distillation,compact network design and neural architecture search,etc.These methods reduce the number of parameters and computational complexity from different perspec-tives,while minimizing the performance loss of neural networks.On the basis of these theoretical methods,this thesis analyzes the shortcomings of compression and accelera-tion techniques,and explores effective and efficient compression methods so as to obtain the neural network models with fewer parameters,faster acceleration and higher accuracy.The main contributions include:(1)A channel pruning method based on Taylor expansion of next convolution layer is proposed.Since pruning a channel will lead to the output changes in subsequent convolution layer,this method uses the loss approximation based on Taylor expansion to evaluate the importance of the pruned channel,aiming to minimize the cumulative impact on network output.In order to improve learning efficiency,a part of training dataset is used for ranking the channel importance and the other part is used for recovering the network performance through retraining.This method results in a certain loss of accuracy,while reducing the number of parameters and computational complexity,thus a compact and efficient neural network is obtained.(2)A novel weak subnetwork pruning method is presented.The relationship between activation sparsity and gradient sparsity is theoretically analyzed,and thel1norm of both is used to identify weak subnetworks in deep neural networks.Each channel in the weak subnetwork has minimal impact in forward propagation and back propagation.The efficiency of network compression can be effectively improved by pruning the whole weak subnetwork in an one-shot strategy,and the accuracy of the pruned network can be restored by the subsequent retraining process.(3)A novel self-grouping convolutional neural network is proposed.For each filter,the method constructs the importance vectors of its input channels based on thel1norm of connections.The filters of each layer are grouped by clustering based on the similarity of their importance vectors.According to the prior knowledge of cluster centroids,the u-nimportant connections of each group are pruned,and a group convolution is constructed with diverse structures.The global fine-tuning is subsequently used to maintain the rep-resentation capability of the network,thus an compact and efficient deep neural network is obtained.(4)A differentiable neural architecture learning method is proposed,which uses the scaled sigmoid function for architecture learning,and converts the discrete optimization problem of neural architectures into a continuous scaled sigmoid optimization problem.And this method produces no additional candidate neural architectures.In order to avoid the interference between weight parameters and architecture parameters,the optimization process of networks is decoupled into weight optimization and architecture optimization,while alleviating the problem of gradient vanishing.This method demonstrates superior performance and is applicable to traditional convolutional neural networks(e.g.,VGG and Res Net),lightweight convolutional neural networks(e.g.,Mobile Net V2),and stochastic supernets(e.g.,Proxyless NAS).
Keywords/Search Tags:convolutional neural network, model compression and acceleration, network pruning, group convolution, neural architecture search
PDF Full Text Request
Related items