Towards Convolutional Neural Network Acceleration And Compression Via K-Means Cluster

Posted on:2019-11-12

Degree:Master

Type:Thesis

Country:China

Candidate:G L Chen

Full Text:PDF

GTID:2428330611993257

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

Artificial neural networks are widely used in artificial intelligence applications such as voice assistant,image recognition and natural language processing.With the rise of complexity of the application,the computational complexity has also increased dramatically.The traditional general-purpose processor is limited by the memory bandwidth and energy consumption when dealing with the complex neural network.People began to improve the architecture of the general-purpose processors to support the efficient processing of the neural network.In addition,the development of specialpurpose accelerators becomes another way to accelerate processing of neural network.Compared with the general-purpose processor,it has lower energy consumption and higher performance.But there is no data flow that is suitable for all network acceleration.Traditional compression schemes,such as pruning,low rank factorization,sparse networks,etc.,which can effectively reduce network parameters,but they destroy the rules structure of the network and increase the training complexity.In order to solve the above limitation,the subject make use of the prediction accuracy confidence interval trained by the neural network itself,proposes a method of using Kmeans to accelerate and compress the neural network.In neural networks,convolutional layers are computationally intensive and fully connected layers are storage intensive.The processing speed of the former cannot keep up with the access speed,and the access speed of the latter cannot keep up with the processing speed.The amount of calculation is reduced by compressing the input feature map in the convolution process with Kmeans;the amount of storage is reduced by compressing the weight of the fully connected layer.Specifically,a Kmeans layer is added before the input of the convolution layer,and the input feature map is clustered by Kmeans,so that it is assumed that there are 1000 numbers before clustering and 32 classes after clustering,so that only 32 classes need to be used.Multiplying the 32 classes by all the weights yields the result of the original 1000 numbers,which reduces the amount of computation.Kmeans clustering is used for the weight parameter of the fully connected layer,and the clustering value is used as an index label.When storing the model,we store the index labels of the weighted clustered values.Compared with the original 32-bit weights,the index labels usually only have three or four digits(according to the number of clusters),so as to achieve the purpose of compression.The proposed method can reduce the calculation amount of a single convolution layer of Alex Net network by up to 100 times.By adding the appropriate Kmeans layer,the acceleration ratio of the whole network can reach 2.007,and the network compression can reach 10.4 times.Finally,a hardware structure is designed to match the processing flow.

Keywords/Search Tags:

Neural Network, Confidence Interval, Kmeans algorithm, Acceleration and Compression

PDF Full Text Request

Related items

1	Development Of Model Compression And Inference Acceleration Algorithms Of Image Super Resolution Deep Neural Networks
2	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
3	Research Of The Model Compression Algorithm For Deep Neural Network
4	The Acceleration And Compression Of Convolutional Neural Networks
5	Research And Application On Interval Neural Network
6	Research On Compression And Acceleration For Deep Convolutional Network Model
7	Research On Model Compression And Acceleration For Convolutional Neural Network Under Resource-constrained Scenarios
8	Research On The Compression And Hardware Acceleration Based On Convolutional Neural Network
9	Research On Deep Neural Networks Compression And Acceleration
10	Simplification Of Deep Models:Storage Compression And Computational Acceleration