Design And Optimization Of Tiny YOLO Convolutional Neural Network Accelerator

Posted on:2020-10-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Liu

Full Text:PDF

GTID:2518306518970179

Subject:IC Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,convolutional neural networks have been widely adopted in computer vision tasks such as image classification,target detection and scene segmentation due to their high classification accuracy.The classification accuracy of convolutional neural networks increases as the number of network layers increases.However,as the network deepens,the network becomes larger,and the amount of computation required increases dramatically.It is a very time-consuming task to implement convolutional neural network algorithms using software.Various hardware accelerators have emerged to improve the computational performance of the CNN models and meet the real-time and low-power requirements of embedded devices.Among them,field programmable gate array has become an ideal platform for hardware accelerators due to its powerful parallel computing capability,high energy efficiency and high flexibility.In this paper,the Tiny YOLO algorithm with typical CNN network structure is implemented in hardware for acceleration.An accelerator architecture based on image fine-grained block strategy is proposed.In the hardware design,the padding scheme applied to the Line Buffer structure is proposed,which avoids the time redundancy and space redundancy problems of the software solution.In order to further improve the performance of the accelerator,this paper improves the computational parallelism by changing the calculating order of the first layer;improves the data transmission efficiency by optimizing the Line Buffer structure;and reduces the system latency through the ping-pong technique and the full pipeline design.The experimental results show that the performance achieved by this design is 270.16 GOP/s under 150 MHz working frequency.Compared to the CPU implementation,the speedup ratio is 6times;compared to the GPU implementation,the performance-to-power ratio is 9times;more importantly,the accelerator shows a 1.3x～1.7x speedup compared with the state-of-the-art technique based on FPGA.

Keywords/Search Tags:

Convolutional neural network, Hardware accelerator, Field programmable gate array

PDF Full Text Request

Related items

1	ZYNQ-Based Reconfigurable Convolutional Neural Network Accelerator
2	Research And Design Of Convolutional Neural Network Accelerator Based On Multi-FPGA Co-acceleration
3	Research And Implementation Of FPGA Accelerated Convolutional Neural Network Training
4	Research On Convolutional Neural Network Accelerator Based On FPGA
5	Research On Convolutional Neural Networks Accelerator Based On FPGA
6	Research And Implementation Of FPGA Accelerating Compressed Convolutional Neural Network
7	Research On Hardware And Software Co-Accelerating Method For Neural Network Applications
8	Design And Research Of FPGA-based Deep Learning Accelerator
9	The Research And Implementation Of Convolutional Neural Network Based On FPGA
10	Research On Systolic Array Based Hardware Accelerator For Convolutional Neural Networks