Font Size: a A A

Operation Unit Optimization Of LSTM Hardware Accelerator

Posted on:2021-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaFull Text:PDF
GTID:2428330647950666Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the improvement of computer computing power and the huge amount of training data generated in the Internet era,machine learning technology(especially deep learning)has made breakthroughs in more and more intelligent fields.As a variant network structure of RNN,LSTM solves the problem of gradient explosion and gradient disappearance in RNN in back propagation,so it is very popular in natural language processing and other fields.However,there are many problems in practical engineering application.Traditional computing platforms cannot support such a large amount of data computation of LSTM.Moreover,in the embedded application,especially in the field of time ductility such as automatic driving,the LSTM model itself,due to its huge number of parameters and massive training data and inferred test data,not only has a high computational complexity in the training model and model inference,but also has a large power consumption of the computing platform.Therefore,aiming at the above problems,this paper studies how to accelerate the hardware calculation of the compressed LSTM model on FPGA,and redeploy the sparse network after compression,so as to solve the unbalanced problem of its PE computing unit and optimize the nonlinear function operation.First of all,this paper fully investigated the research status of LSTM hardware accelerators at home and abroad,analyzed and classified the innovation points and made progress.The key point of this paper is to accelerate and optimize the PE operation unit.Secondly,basedon the sparse neural network model studied by predecessors,customized hardware design was carried out,including adjusting the order of parameter distribution at the software level,as well as designing a configurable hardware operation unit suitable for sparse data multiplication.The sparsity of parameters was fully exploited to save computing time and performance power consumption.Then,the FPGA prototype verification platform was built,and the excitation data was generated at the PC host end.Weight data was generated by GPU training,and then the weight was pruned according to its threshold value to generate sparse weight data.The parameters and excitation data were transmitted to the FPGA development board through the network interface,and the hardware logic code was written for simulation test,and the accuracy was compared with the results of the matlab software model.The resource consumption,power consumption and other performance indexes of the hardware accelerator are analyzed.The results show that compared with ESE,CPU,GPU and other hardware platforms,the performance and power consumption of the designed computing unit module are improved.Compared with the CPU,it achieves 25.09 times the acceleration ratio and only consumes 18% of its power.In terms of energy efficiency ratio,it is 64.19 times that of GPU and 4.76 times that of ESE respectively.In addition,in the matlab to generate multiple sets of different degree of sparse matrix,respectively as the test object,thus comparing the weights of different degree of sparse matrix operation performance,the results show that the improved method of this paper are best suited for sparse degree for about 48% of the matrix,for test matrix dimension is 1024 * 153,100 MHZ system clock,and its operation time can promote 665 ns.
Keywords/Search Tags:LSTM, Neural network model compression, Hardware acceleration of sparse neural network, Load balancing
PDF Full Text Request
Related items