| Deep learning is one of the important research directions in the field of artificial intelligence.Its purpose is to enable artificial intelligence algorithms to learn from sample data to obtain the ability to analyze and solve problems similar to the human brain.In recent years,target detection algorithms based on deep learning technology have been applied in intelligent security,unmanned driving,and intelligence due to their excellent detection accuracy and stronger robustness and generalization ability compared with traditional target detection algorithms.While deep learning target detection technology brings convenience,the implementation of this technology also faces challenges.Most of the existing artificial intelligence systems use a heterogeneous platform such as CPU+GPU to deploy to accelerate deep learning algorithms.The main hardware is GPU.Although GPU has powerful computing power,it can significantly improve the training and inference efficiency of deep learning algorithms.However,GPU needs to support many computing types and lacks specificity,resulting in large chip scale and high power consumption.Deployment in scenarios where consumption and performance are limited.In response to this problem,this paper proposes a CPU+FPGA-based target detection algorithm acceleration system for heterogeneous platforms.FPGA is a programmable device with high flexibility,high throughput,and low power consumption.It can use its internal Resource completion and efficient algorithm parallel acceleration.This paper selects the current popular YOLOv5 target detection algorithm,uses the software and hardware co-design method to carry out reasonable task division,and performs lightweight processing on the target detection algorithm.At the same time,the operator fusion operation is added to improve the algorithm inference speed,and the 8bit fixed-point quantization scheme is used to make the model more Suitable for FPGA device deployment,explore efficient target detection algorithm hardware acceleration architecture and use RTL-level development and implement convolution,up-sampling,down-sampling,splicing and other functional operators,design AI instruction set development host computer to achieve CPU+FPGA heterogeneous accelerator The system,after testing,the frame rate of deploying the YOLOv5s model in the system can reach 85 FPS,and the lightweight model Mobilenet-YOLOv5s can reach 101 FPS,and the power consumption is only 20 W,which meets the real-time target detection requirements in low-power scenarios. |