Font Size: a A A

Research On The Compression And Hardware Acceleration Based On Convolutional Neural Network

Posted on:2020-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:H H WuFull Text:PDF
GTID:2428330599452876Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Convolution Neural Network(CNN)has developed rapidly in the fields of image,speech and face recognition,especially in the field of image.In reality,the application of traditional algorithms often relies on embedded platforms with small size and low power consumption.Convolutional neural networks are characterized by huge parameters,complex network models and large amount of computation.These characteristics make it difficult for convolutional neural networks to run fast in embedded end.In view of the huge parameters and complex network model of convolutional neural network,this paper proposes network tailoring and weight quantization to compress the network.At the same time,in view of the large amount of computation of convolutional neural network,the calculation is accelerated by using FPGA.The convolutional neural network studied in this paper is Tiny-yolo.Firstly,the connection relationship of Tiny-yolo network is analyzed,the connection with smaller weight is cut down,the number of weight is reduced,and the network compression is realized.The weight matrix after pruning adopts the sparse storage method to reduce the memory occupation of network model;the sparse network is retrained to achieve the purpose of compression while ensuring the network.The recognition accuracy will not decrease dramatically before and after tailoring.Secondly,the weights are quantified.In this paper,the original 32-bits floating-point data type of Tiny-yolo is quantified to 8-bits signed char data type,which further reduces the memory occupancy and computational complexity of the model within the range of ensuring accuracy error.Finally,according to the structure characteristics of Tiny-yolo network,a deep parallel-pipelining FPGA addition is proposed.The fast optimization scheme accelerates the data caching and convolution operation,and finally realizes the fast running of Tiny-yolo network on the embedded end.The experimental results show that the number of parameters is reduced by 90% and the memory occupied by the model is changed from 63.5 MB to 4.55 MB on the premise of ensuring the accuracy of network identification.Quantitative realization of compression ratio of about 4X will result in loss of network accuracy and decrease of mAP by 2 percentage points,but it has little effect on the final test results.Hardware acceleration optimization,compared with the operation on ARM Cortex-A9 with the maximum frequency of 667 MHZ,achieves about 7X operation acceleration.
Keywords/Search Tags:neural network, compression, hardware acceleration, FPGA
PDF Full Text Request
Related items