Font Size: a A A

Research On FPGA-Based Convolutional Neural Network Acceleration And Performance Optimization

Posted on:2024-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z S XieFull Text:PDF
GTID:2568307139958809Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of technology,deep learning technology in the field of artificial intelligence has received widespread attention.Among them,Convolutional Neural Network(CNN)enables machines to possess human-like analysis capabilities by learning patterns in samples.Therefore,deep learning is widely used in image detection and object recognition.In order to solve more complex problems,the depth of CNN also increases gradually,which requires a lot of computing power and memory bandwidth of general-purpose processors.The computing power of general-purpose processors can no longer meet the design requirements.In order to accelerate CNN,many researchers have turned their attention to hardware such as Graphics processing unit(GPU),Application Specific Integrated Circuit(ASIC),and Field Programmable Gate Array(FPGA).Among them,FPGA has a very broad prospect in the deployment research of CNN due to its excellent flexibility,powerful parallel computing ability,and low power consumption.This article analyzes the structural composition of the Le Net convolutional neural network and the connections between each layer,and studies the network structure of each layer.In order to optimize the performance of the accelerator,High-level Synthesis(HLS)is adopted to complete the design.The storage space consumption is reduced by pruning and compressing the parameters of the input neural network,and the fixed-pointization method is optimized by converting it into parameterization to improve the portability of the accelerator.Based on this,in order to break the limitations of accelerator memory access and computing speed,three different software and hardware co-design schemes were designed for the Zynq platform.In addition,according to different ways of accessing and storing data,a serial accelerator,UNROLL accelerator,and PIPELINE accelerator were designed for the same network,analyzing the acceleration ideas from multiple perspectives while increasing the versatility of the accelerator.Based on the characteristics of hardware,pipeline algorithms and adder tree algorithms were added to the network calculation process,increasing the parallelism of the accelerator,improving the throughput of data,and effectively improving the speed of the accelerator.Considering practical situations,this study uses 32-bit fixed-point storage and computation for weight parameters,and parameterizes fixed-point length to enhance the portability of neural networks.Power and speed indicators of the three designed accelerators are analyzed.Experimental results show that compared with CPU,the power consumption of the serial computing accelerator is reduced by 92.63%,and its speed is increased by 3.682 times.The power consumption of the UNROLL accelerator is reduced by 92.02%,and its speed is increased by 4.680 times.The power consumption of the PIPELINE accelerator is reduced by 91.37%,and its speed is increased by 70.387 times.Compared with GPU,the power consumption of the serial computing accelerator is reduced by 95.74%,the power consumption of the UNROLL accelerator is reduced by 95.38%,and the power consumption of the PIPELINE accelerator is reduced by 95.01%.It can be seen that the convolutional neural network accelerator based on FPGA designed in this article has the characteristics of fast speed and low energy consumption,suitable for deployment on low-power mobile devices.
Keywords/Search Tags:FPGA, Deep learning, Convolutional neural network, Adder tree, Pipeline
PDF Full Text Request
Related items