Font Size: a A A

Convolutional Neural Network Compression And Accelerate Forward Inference Technology Research

Posted on:2019-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:H WuFull Text:PDF
GTID:2428330551956833Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Benefit from large data sets and efficient parallel computing power of GPU,deep learning has made leaps and bounds in the past few years.Today,deep learning out-performs human performance in many areas,researchers have begun to industrialize this technology;A series of devices such as smart speakers and smartphones gradually have become the first choice of deep learning landing.In the past for a long time,only large neural networks could meet the performance requirements of commercial prod-ucts.However,many devices have very limited computing resources and cannot meet the computational needs of large neural networks in terms of time,memory,or energy consumption.In order to accelerate and compress the existing neural networks,this paper carry out the research work from the following three aspects:In order to find the objects that need to be optimized in convolutional neural net-work,this paper first statistics the FLOPs and Parameters of each network layer.The statistical results show that the convolution layer is the most computationally intensive part of the convolutional neural network,while the fully connected layer is the part with the highest proportion of parameters.Next,in order to understand the impact of model compression on the accuracy,this paper also statistics AlexNet weight distribution.The experimental results show that there are a large number of parameters close to zero in the convolution layer and the fully connected layer.This part of the parameters con-tributes little to the model,so the network structure can be compressed without affecting the original accuracy.Based on the statistical experiments,this paper proposes an accelerating strategy for the memory access continuity of the convolution operation.In the deep learning framework Caffe,the convolution is implemented as a matrix multiplication.The Caffe convolution contains two main operations,namely im2col and gemm.im2col full name Image to Columns,this operation is mainly responsible for expanding the input im-age.gemm is an abbreviation for General Matrix-matrix Multiplication;It is mainly responsible for the multiplication between matrix and matrix.In a row-major order ar-chitecture,changing the data layout of the input image through the transpose operation can increase the access efficiency of im2col and gemm at the same time.Experimental results show that the acceleration ratio of improved convolution is around 40%.In addition to improving the convolution operation,this paper also proposes a new type of compression algorithm for the problem of excessive model size.In a pretrained model,each layer of neurons have a fixed mapping relationship.Due to the large amount of redundancy in the convolutional neural network,the original mapping can still be maintained after removing some parameters.This method extracts the input and output relations of all samples in the test set into smaller network structures by removing redun-dant neurons and convolution kernels.The compressed model is not only smaller and faster,but the accuracy is not affected.Experimental results show that the compression ratio is about 4x to 21x,and the acceleration ratio is about 2x to 5x.
Keywords/Search Tags:Convolutional Neural Network, Forward Inference, Acceleration, Com-pression
PDF Full Text Request
Related items