Font Size: a A A

Design And Research Of FPGA Multi-threading Accelerator System For Convolutional Neural Network

Posted on:2023-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:X G MaFull Text:PDF
GTID:2568306848981209Subject:Electronics and information engineering
Abstract/Summary:PDF Full Text Request
As one of the important algorithms of deep learning,convolutional neural network has complex network structure,powerful feature learning ability and feature expression ability,and has been widely used in many fields such as computer vision processing,natural language recognition and big data analysis.Deep learning applications based on convolutional neural networks are divided into two task phases: training and inference,both stages are computeintensive,and it runs on high-performance processor clusters at present,such as CPU,GPU and server.Nowadays,convolutional neural networks are widely used in smartphones,unmanned driving and internet of Things,but there is a prominent problem that huge network models and excessive power consumption do not match the capability of hardware devices.Therefore,the design of convolutional neural network accelerator with high energy efficiency ratio has become the main research.Aiming at the above problems,this paper studied the compression of convolutional neural network and hardware acceleration,and proposed a method research for multi-thread accelerators based on FPGA(Field Programmable Gate Array)platform.The main research contents of this paper are as follows.Firstly,for the calculation of convolutional neural network,in view of the redundant problem of common network model calculation,this paper deeply analyzed and studied the network optimization methods and parameter optimization problems.For the network model optimization,batch normalization was added as part of the network model structure based on the existing network,and the network structure layer was redesigned,which effectively solved the problem of network gradient explosion,and accelerated the network convergence,then reduced the number of training times.When the network performed the inference task,in order to solve the problem that the computation time was too long because of the redundancy of network model parameters,the channel selection algorithm of LASSO regression was proposed in this paper,and redundant channel was eliminated by reconstruction error method for minimizing output features.In addition,in order to calculate faster on the hardware platform,this paper adopted the parameter optimization method of Q-format floating-point dynamic fixed-point quantization,which further compressed the network parameter occupation space and improved the network computation efficiency.Secondly,the parallelism in the process of convolutional neural network computation was optimized by multithreading,in the computation task of convolutional layer,through the research on the accelerator of the conventional spatial convolution calculation,the computational characteristics and data transmission problems in the inference stage were deeply analyzed,then,the multi-threaded parallel computing architecture was proposed to analyze the parallelism of computation at each parameter level,channel level and layer level respectively.Based on the pipelined computing strategy,the buffer was adopted for the same convolutional operation process,which further reduced the delay problem of data interaction with external devices,and efficiently utilized the computational resources of FPGA(Field Programmable Gate Array);In terms of data storage,this paper analyzed the off-chip access data volume in different block ways,divided the output height and channel of convolution into blocks,and the data volume access under different data reuse conditions was analyzed,which provided theoretical support for the hardware acceleration structure in the design of data access.Finally,the design scheme of convolutional neural network computation and hardware resource allocation among modules was proposed.The Xilinx XC7Z020 embedded platform was used in this paper,aiming at the parallelism of convolution operation,network multi-thread computing architecture was proposed to realize the parallel computation of convolution sliding window and output channel.The architecture linked the computing array to linear flow computing,and each thread processed a sliding window.The parallel computation of multiple output dimensions was realized within the threads,and sliding window parallel computing was implemented between threads.Moreover,the scheme achieved reuse of internal feature map according to different block methods and weights between threads,which reduced the demand for on-chip memory and access bandwidth.The experiments took Alex Net and VGG16 as the target networks,we assumed in the application scenario of image recognition using convolutional neural networks,the proportion of computing resources of FPGA under different network structures was analyzed,as well as the convolutional neural network operation data based on CPU and GPU were compared.Compared with the CPU scheme,this scheme could significantly improve the computational efficiency;For the GPU scheme,under the premise that the calculation efficiency would not be greatly reduced,this design could reduce the power consumption of the system and effectively lower the energy efficiency ratio.
Keywords/Search Tags:FPGA, Convolutional neural network accelerator, parallel computing, multi-threaded computing architecture
PDF Full Text Request
Related items