Font Size: a A A

Research On FPGA-based Accelerator For Object Detection Neural Network

Posted on:2024-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z F YueFull Text:PDF
GTID:2558307067468374Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Object detection is one of the most important applications in the field of artificial intelligence.Object detection algorithms are widely used in all aspects of life.However,deploying these object detection algorithms on embedded devices is not easy.First,embedded devices usually have limited resources,such as memory,storage space,and processing power,which cannot deploy large artificial intelligence models.Second,embedded devices generally have limited power sources,which limit the runtime and energy consumption of AI models.Taking these factors into consideration,this paper focuses on the deployment of object detection networks on embedded devices and optimizes them in both software and hardware aspects.The main research focuses are as follows:1.Model quantization.To address the limited storage space and processing power of embedded devices,this paper studies model quantization in the software part.This paper performs int8 quantization on the YOLO V2 tiny model,and applies symmetric quantization and asymmetric quantization on the weight and feature maps respectively.Quantization not only decreases storage space,but also enhances data transmission speed and computation efficiency.2.Designing a convolution accelerator.This paper implements convolution and pooling acceleration units on the FPGA.Firstly,due to the limitation of BRAM storage space,we partitioned the convolution.Then we unrolled the partitioned convolutions on input and output channels to achieve parallel computing.In addition,we pipelined the five processes of convolution to improve the throughput of the convolution unit.To improve memory access speed,we built SD card,DDR,and BRAM three level cache.We allocated space for input feature maps,output feature maps,and weight parameters on BRAM.We also studied data reuse technology to increase the number of data reuses and reduce the frequency of memory access.Through the above optimization strategies,the delay of convolution and pooling can be shortened.3.Build an object detection system.This paper implemented an object detection system on PYNQ-Z2.First,we divided the tasks.The ARM side was responsible for image preprocessing and post-processing,which requires more computation and has poor parallelism.The FPGA side was responsible for convolution and pooling acceleration.The ARM and FPGA were connected by the AXI bus.By calling the convolution and pooling IP cores on the ARM,we accelerated the neural network inference process.In summary,by optimizing the system’s hardware and software,we successfully deployed the YOLO v2 tiny network on a resource-limited FPGA chip.By utilizing the heterogeneous computing of ARM and FPGA,the system fully utilizes the resources of PYNQ-Z2 and accelerates the inference speed of the system.
Keywords/Search Tags:FPGA, Quantization, Object detection, Neural network accelerator, Parallel computing
PDF Full Text Request
Related items