Research On FPGA-Based Convolutional Neural Network Acceleration And Performance Optimization

Posted on:2024-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z S Xie

Full Text:PDF

GTID:2568307139958809

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

In recent years,with the continuous development of technology,deep learning technology in the field of artificial intelligence has received widespread attention.Among them,Convolutional Neural Network(CNN)enables machines to possess human-like analysis capabilities by learning patterns in samples.Therefore,deep learning is widely used in image detection and object recognition.In order to solve more complex problems,the depth of CNN also increases gradually,which requires a lot of computing power and memory bandwidth of general-purpose processors.The computing power of general-purpose processors can no longer meet the design requirements.In order to accelerate CNN,many researchers have turned their attention to hardware such as Graphics processing unit(GPU),Application Specific Integrated Circuit(ASIC),and Field Programmable Gate Array(FPGA).Among them,FPGA has a very broad prospect in the deployment research of CNN due to its excellent flexibility,powerful parallel computing ability,and low power consumption.This article analyzes the structural composition of the Le Net convolutional neural network and the connections between each layer,and studies the network structure of each layer.In order to optimize the performance of the accelerator,High-level Synthesis(HLS)is adopted to complete the design.The storage space consumption is reduced by pruning and compressing the parameters of the input neural network,and the fixed-pointization method is optimized by converting it into parameterization to improve the portability of the accelerator.Based on this,in order to break the limitations of accelerator memory access and computing speed,three different software and hardware co-design schemes were designed for the Zynq platform.In addition,according to different ways of accessing and storing data,a serial accelerator,UNROLL accelerator,and PIPELINE accelerator were designed for the same network,analyzing the acceleration ideas from multiple perspectives while increasing the versatility of the accelerator.Based on the characteristics of hardware,pipeline algorithms and adder tree algorithms were added to the network calculation process,increasing the parallelism of the accelerator,improving the throughput of data,and effectively improving the speed of the accelerator.Considering practical situations,this study uses 32-bit fixed-point storage and computation for weight parameters,and parameterizes fixed-point length to enhance the portability of neural networks.Power and speed indicators of the three designed accelerators are analyzed.Experimental results show that compared with CPU,the power consumption of the serial computing accelerator is reduced by 92.63%,and its speed is increased by 3.682 times.The power consumption of the UNROLL accelerator is reduced by 92.02%,and its speed is increased by 4.680 times.The power consumption of the PIPELINE accelerator is reduced by 91.37%,and its speed is increased by 70.387 times.Compared with GPU,the power consumption of the serial computing accelerator is reduced by 95.74%,the power consumption of the UNROLL accelerator is reduced by 95.38%,and the power consumption of the PIPELINE accelerator is reduced by 95.01%.It can be seen that the convolutional neural network accelerator based on FPGA designed in this article has the characteristics of fast speed and low energy consumption,suitable for deployment on low-power mobile devices.

Keywords/Search Tags:

FPGA, Deep learning, Convolutional neural network, Adder tree, Pipeline

PDF Full Text Request

Related items

1	Research On Acceleration Of Deep Convolutional Neural Network Based On FPGA
2	Research On Hardware Acceleration Method Of Deep Convolutional Neural Network Based On FPGA
3	RF Fingerprint Extraction Based On Convolutional Neural Network And FPGA Accelerator Design
4	Hardware Acceleration Design And Research Based On FPGA And Deep Learning Algorithm
5	Design And Implementation Of Feedforward Neural Network And Particle Swarm Optimization Based On FPGA
6	FPGA-based Accelerator For Convolutional Neural Network
7	Research On Key Technologies Of High Performance Accelerator For Convolution And Recurrent Neural Networks
8	Research Of Acceleration Technology For Convolutional Neural Networks Based On FPGA
9	Design And Implementation Of Lightweight Convolutional Neural Network Accelerator On SoPC
10	Design Of General-purpose Deep Convolutional Neural Network Accelerator Based On FPGA