Font Size: a A A

Towards Convolutional Neural Network Acceleration And Compression Via K-Means Cluster

Posted on:2019-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:G L ChenFull Text:PDF
GTID:2428330611993257Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Artificial neural networks are widely used in artificial intelligence applications such as voice assistant,image recognition and natural language processing.With the rise of complexity of the application,the computational complexity has also increased dramatically.The traditional general-purpose processor is limited by the memory bandwidth and energy consumption when dealing with the complex neural network.People began to improve the architecture of the general-purpose processors to support the efficient processing of the neural network.In addition,the development of specialpurpose accelerators becomes another way to accelerate processing of neural network.Compared with the general-purpose processor,it has lower energy consumption and higher performance.But there is no data flow that is suitable for all network acceleration.Traditional compression schemes,such as pruning,low rank factorization,sparse networks,etc.,which can effectively reduce network parameters,but they destroy the rules structure of the network and increase the training complexity.In order to solve the above limitation,the subject make use of the prediction accuracy confidence interval trained by the neural network itself,proposes a method of using Kmeans to accelerate and compress the neural network.In neural networks,convolutional layers are computationally intensive and fully connected layers are storage intensive.The processing speed of the former cannot keep up with the access speed,and the access speed of the latter cannot keep up with the processing speed.The amount of calculation is reduced by compressing the input feature map in the convolution process with Kmeans;the amount of storage is reduced by compressing the weight of the fully connected layer.Specifically,a Kmeans layer is added before the input of the convolution layer,and the input feature map is clustered by Kmeans,so that it is assumed that there are 1000 numbers before clustering and 32 classes after clustering,so that only 32 classes need to be used.Multiplying the 32 classes by all the weights yields the result of the original 1000 numbers,which reduces the amount of computation.Kmeans clustering is used for the weight parameter of the fully connected layer,and the clustering value is used as an index label.When storing the model,we store the index labels of the weighted clustered values.Compared with the original 32-bit weights,the index labels usually only have three or four digits(according to the number of clusters),so as to achieve the purpose of compression.The proposed method can reduce the calculation amount of a single convolution layer of Alex Net network by up to 100 times.By adding the appropriate Kmeans layer,the acceleration ratio of the whole network can reach 2.007,and the network compression can reach 10.4 times.Finally,a hardware structure is designed to match the processing flow.
Keywords/Search Tags:Neural Network, Confidence Interval, Kmeans algorithm, Acceleration and Compression
PDF Full Text Request
Related items