Research On FPGA-based Accelerator For Object Detection Neural Network

Posted on:2024-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z F Yue

Full Text:PDF

GTID:2558307067468374

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Object detection is one of the most important applications in the field of artificial intelligence.Object detection algorithms are widely used in all aspects of life.However,deploying these object detection algorithms on embedded devices is not easy.First,embedded devices usually have limited resources,such as memory,storage space,and processing power,which cannot deploy large artificial intelligence models.Second,embedded devices generally have limited power sources,which limit the runtime and energy consumption of AI models.Taking these factors into consideration,this paper focuses on the deployment of object detection networks on embedded devices and optimizes them in both software and hardware aspects.The main research focuses are as follows:1.Model quantization.To address the limited storage space and processing power of embedded devices,this paper studies model quantization in the software part.This paper performs int8 quantization on the YOLO V2 tiny model,and applies symmetric quantization and asymmetric quantization on the weight and feature maps respectively.Quantization not only decreases storage space,but also enhances data transmission speed and computation efficiency.2.Designing a convolution accelerator.This paper implements convolution and pooling acceleration units on the FPGA.Firstly,due to the limitation of BRAM storage space,we partitioned the convolution.Then we unrolled the partitioned convolutions on input and output channels to achieve parallel computing.In addition,we pipelined the five processes of convolution to improve the throughput of the convolution unit.To improve memory access speed,we built SD card,DDR,and BRAM three level cache.We allocated space for input feature maps,output feature maps,and weight parameters on BRAM.We also studied data reuse technology to increase the number of data reuses and reduce the frequency of memory access.Through the above optimization strategies,the delay of convolution and pooling can be shortened.3.Build an object detection system.This paper implemented an object detection system on PYNQ-Z2.First,we divided the tasks.The ARM side was responsible for image preprocessing and post-processing,which requires more computation and has poor parallelism.The FPGA side was responsible for convolution and pooling acceleration.The ARM and FPGA were connected by the AXI bus.By calling the convolution and pooling IP cores on the ARM,we accelerated the neural network inference process.In summary,by optimizing the system’s hardware and software,we successfully deployed the YOLO v2 tiny network on a resource-limited FPGA chip.By utilizing the heterogeneous computing of ARM and FPGA,the system fully utilizes the resources of PYNQ-Z2 and accelerates the inference speed of the system.

Keywords/Search Tags:

FPGA, Quantization, Object detection, Neural network accelerator, Parallel computing

PDF Full Text Request

Related items

1	Parallel Accelerator Design For Convolutional Neural Networks Based On FPGA
2	Research On Scheduling Strategy Of Multi-core Convolutional Neural Network Accelerator Based On FPGA
3	Design And Research Of FPGA Multi-threading Accelerator System For Convolutional Neural Network
4	The Algorithm Design And FPGA Verification Of Face Detection And Recogniton Based On 8Bit Quantization Neural Network
5	Implementation With FPGA And Compression Of Faster R-CNN Object Detection Network Algorithm
6	Research And Design Of Key Technology Of FPGA-based Convolutional Neural Network Accelerator
7	Research On Dynamic Quantization Algorithm Of Convolutional Neural Networks And Its Parallel Computing Structure
8	Design And Implementation Of Convolutional Neural Network Accelerator Based On Affine Quantization
9	Design And Implementation Of A High-performance Accelerator Dedicated For Convolutional Neural Networks
10	Research On Key Technologies Of Reconfigurable Neural Network Accelerator Design