Font Size: a A A

Hardware Accelerator Design Of Convolutional Neural Networks For Low Power And High Performance

Posted on:2020-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhangFull Text:PDF
GTID:2428330578459444Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Convolution neural networks(CNN)have been widely studied at home and abroad because of its excellent performance in image recognition,speech recognition and driveless.With the improvement of its performance and accuracy,the number of layers and computation of CNN increase significantly.After the rectified linear unit,CNN has more than 50% zero data.The calculation of zero data does not change the calculation results,but consumes high energy consumption and calculation cycle.Therefore,the compression of zero data and the reuse of compressed data have become an urgent problem to be solved.This thesis takes this as the research object and works on following four aspects:(1)Improving transmission efficiency using multipath packet connection circuit;(2)Reducing power consumption and computing cycle by non-restore compression coding and decoding method;(3)Encoding for reuse method to improves data utilization efficiency;(4)Designing low-power and high-performance CNN hardware accelerator;The main tasks are as follows:(1)Designing multipath packet connection circuitThe(X,Y)routing algorithm for the traditional packet connection circuit(PCC)results in a small number of destination nodes for multicast transmission,resulting in low transmission efficiency,and can't implement multiple transmission modes at the same time.PCC can't meet the requirements of large data transmission and complex transmission mode of each CNN layer.This thesis designs a multipath packet connection circuit to improve transmission efficiency in CNN.The multipath packet connection circuit uses two multicast input channels and one unicast output channel to realize input-calculation-output independent of each other,and combines the judgment mechanism of the multicast and the receiving module to implement multiple transmission modes.The experimental results show that compared with the traditional PCC,the multicast transmission channel setup time is reduced by 60.4%,and the multicast packet transmission time is increased by 2.53 x.(2)Designing non-restore compression coding and decoding circuitTo solve the problems that the traditional coding and decoding method has low compression rate in the CNN field,the restore problem during transmission and calculation,and the calculation of "0" can't be actually skipped,this thesis designs the non-restore compression and decoding circuit.When encoding,according to the characteristics of the convolution calculation,the coding circuit use 0/1 encoding for each input value to filter 0 value,and retain the number of valid values per line;When decoding,the coding and the corresponding valid value are sent into the calculation module according to the number of valid values,and the calculation module performs shift decoding calculation according to the coding to skip the “0” calculation.The experimental results show that the total compression rate is 58.91%.For each layer,the highest compression rate is 48.64%.Comparing with Eyeriss,the acceleration ratio is 17.7x.(3)Designing coding for line reuse methodIn view of the fact that there is no data reuse method for coded data at present,we adopt the method of coding for line reuse.This design fully exploits the advantages of reducing the amount of compressed data,utilizes the line data reuse of the convolution core in the process of sliding down the input feature map,and uses the time-sharing reuse method to improve the utilization of encoded data.The experimental results show that compared with Eyeriss fixed line multiplexing,off-chip memory reads and writes 45% less.(4)Designing low power and high-performance CNN hardware acceleratorAccording to multipath packet connection circuit,n on-restore compression coding method and coding for line reuse method,this thesis designs a low power and high-performance CNN hardware accelerator,and designs multipath packet connection circuit,coding circuit,control circuit,transmission circuit and calculation circuit.The accelerator uses the configuration chain to control the calculation parameters for each layer of convolution.The experimental results show that the calculation speed is 14.8x compared with Eyerss.
Keywords/Search Tags:Non-restore compression, Coding for line reuse method, Convolutional Neural Network, Hardware Acceleration
PDF Full Text Request
Related items