Font Size: a A A

Research On FPGA Acceleration Of Neural Network Algorithm

Posted on:2020-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:S R WangFull Text:PDF
GTID:2428330599951905Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
In recent years,the development and advancement of neural network algorithms have promoted the rise and development of artificial intelligence.The neural network algorithm has achieved great success in the fields of computer vision and natural language processing,and can perform various tasks such as image recognition,target detection,speech recognition,and machine translation.Because of its large computational complexity,the neural network algorithm needs to be accelerated to meet the requirements of fast processing of data and real-time application.There are three main platforms for the acceleration of neural network algorithms,namely GPU,ASIC and FPGA.The GPU has excellent computing performance,but the GPU has higher power consumption and is mainly used for training acceleration of neural network algorithms.The ASIC can achieve a higher energy efficiency,but the ASIC has a long development cycle,high development cost,and poor flexibility.Compared with GPU,FPGA can achieve higher energy efficiency,which is more flexible than ASIC can be reconstructed to achieve hardware acceleration of different algorithms.Therefore,this thesis uses FPGA to carry out hardware acceleration research of neural network algorithm.In addition,due to the development and maturity of HLS technology,the development of FPGA has become more convenient.In this thesis,OpenCL is used for FPGA development.Compared with the traditional Verilog hardware description language development,the development cycle is greatly reduced,and the FPGA implementation of neural network algorithm can be completed more easily.In this thesis,the research on FPGA acceleration of neural network algorithm is mainly in two aspects,namely convolutional neural network and LSTM neural network.For convolutional neural networks,different functional layers are designed from two aspects: optimizing memory access and improving the degree of parallel computing.The computing of convolutional layers and fully connected layers is accelerated parallelly by designing matrix multiplication modules.The linear buffer structure is designed to make the pooling layer pipeline processing.The parallel pipeline execution strategy is proposed to improve the efficiency of each functional module and improve the overall performance of the system.The design of this thesis can achieve a throughput of 73.26 GOPS on a Stratix-V GXA7 FPGA with a convolutional neural network.For the FPGA acceleration of the LSTM network,this thesis not only carries out the FPGA hardware acceleration design of LSTM network,but also proposes a hardware friendly structured pruning algorithm.The proposed LSTM structured pruning algorithm not only can achieve good model pruning compression,but also can eliminate irregular memory access and calculations due to sparsity.Then,this thesis carries out massive parallel processing unit design for LSTM network and performs coarse-grained layer pipeline optimization,which improves the overall computational efficiency and accelerates the LSTM network.Finally,the design method of this thesis can achieve a very good acceleration of language model and acoustic model.On the 8x compressed language model,the effective computational throughput of 681.6 GOPS and the computational efficiency of 946.63% can be achieved.The effective computational throughput of 339.7 GOPS and the computational efficiency of 482.61% can also be achieved on the acoustic model of 4 times compression.
Keywords/Search Tags:FPGA, hardware acceleration, neural network, convolutional neural network, LSTM, structured pruning
PDF Full Text Request
Related items