Convolutional Neural Network Compression And Accelerate Forward Inference Technology Research

Posted on:2019-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:H Wu

Full Text:PDF

GTID:2428330551956833

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Benefit from large data sets and efficient parallel computing power of GPU,deep learning has made leaps and bounds in the past few years.Today,deep learning out-performs human performance in many areas,researchers have begun to industrialize this technology;A series of devices such as smart speakers and smartphones gradually have become the first choice of deep learning landing.In the past for a long time,only large neural networks could meet the performance requirements of commercial prod-ucts.However,many devices have very limited computing resources and cannot meet the computational needs of large neural networks in terms of time,memory,or energy consumption.In order to accelerate and compress the existing neural networks,this paper carry out the research work from the following three aspects:In order to find the objects that need to be optimized in convolutional neural net-work,this paper first statistics the FLOPs and Parameters of each network layer.The statistical results show that the convolution layer is the most computationally intensive part of the convolutional neural network,while the fully connected layer is the part with the highest proportion of parameters.Next,in order to understand the impact of model compression on the accuracy,this paper also statistics AlexNet weight distribution.The experimental results show that there are a large number of parameters close to zero in the convolution layer and the fully connected layer.This part of the parameters con-tributes little to the model,so the network structure can be compressed without affecting the original accuracy.Based on the statistical experiments,this paper proposes an accelerating strategy for the memory access continuity of the convolution operation.In the deep learning framework Caffe,the convolution is implemented as a matrix multiplication.The Caffe convolution contains two main operations,namely im2col and gemm.im2col full name Image to Columns,this operation is mainly responsible for expanding the input im-age.gemm is an abbreviation for General Matrix-matrix Multiplication;It is mainly responsible for the multiplication between matrix and matrix.In a row-major order ar-chitecture,changing the data layout of the input image through the transpose operation can increase the access efficiency of im2col and gemm at the same time.Experimental results show that the acceleration ratio of improved convolution is around 40%.In addition to improving the convolution operation,this paper also proposes a new type of compression algorithm for the problem of excessive model size.In a pretrained model,each layer of neurons have a fixed mapping relationship.Due to the large amount of redundancy in the convolutional neural network,the original mapping can still be maintained after removing some parameters.This method extracts the input and output relations of all samples in the test set into smaller network structures by removing redun-dant neurons and convolution kernels.The compressed model is not only smaller and faster,but the accuracy is not affected.Experimental results show that the compression ratio is about 4x to 21x,and the acceleration ratio is about 2x to 5x.

Keywords/Search Tags:

Convolutional Neural Network, Forward Inference, Acceleration, Com-pression

PDF Full Text Request

Related items

1	Compressing And Accelerating Deep Neural Networks
2	Design And Implementation Of The Convolutional Neural Network Based On FPGA
3	Convolutional Neural Network Model Compression And Inference Acceleration Based On Look Up Table
4	Research On Convolutional Neural Network Acceleration
5	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
6	An Embedded Inference Framework For Deep Convolution Neural Network:Design And Implementation
7	Research On Convolutional Neural Network Acceleration Framework For Cloud-based FPGAs
8	The Acceleration And Compression Of Convolutional Neural Networks
9	Research On The Acceleration And Optimization Method Of Convolutional Neural Network
10	Design And Implementation Of Convolutional Neural Network Acceleration Based On ZYNQ