In recent years,numerous studies have shown that neural networks have significant advantages over traditional algorithms,and they have been widely used in the fields of image,speech and video recognition.However,due to the huge demand of the neural network for the computing capacity and storage of the hardware platform,the practical application is difficult.The existing CPU platform cannot provide enough computing capacity,while the GPU platform has become the first choice for neural networks because of its high computing capacity and easy-to-use development frameworks.At the same time,FPGA-based neural network accelerators are also the hotspot of current research.With specific hardware design,FPGA can have more processing speeds than GPU,while taking into account energy efficiency.In this context,this thesis explores and studies the parallel acceleration method of neural network target detection algorithm on FPGA based on the real-time detection requirements of vehicles and pedestrians in the field of automatic driving.In this thesis,we first studied neural network target detection algorithms and FPGA-based neural network accelerators.Then based on the SSD(Single Shot Multi Box Detector)target detection algorithm and Res Net(Deep Residual Network)network,this thesis designs the Res Net18-SSD model,which not only maintains the accuracy of the VGG-SSD model,but also effectively reduces the computation and storage complexity.For the inference process of this model,this thesis designs a tiled acceleration scheme based on two Xilinx VU9 P FPGAs.In the case of ensuring the prediction accuracy,the model is fixed-point quantized using the Ristretto tool to avoid direct floating-point operations on the FPGA.At the same time,different hardware structures are designed for different types of convolutional layers.The Winograd algorithm is implemented for the 3′3,stride =1convolutional layer,and the original convolution algorithm is implemented for the3′3,stride =2 convolutional layer.And we alse have specially optimized for Xilinx DSP in the implementation process.In addition,this paper also designed the data cache and reuse structure,so that the entire calculation process is pipelined,and does not depend on off-chip DDR.In addition to the convolutional layer,this thesis also designs the hardware structure of other types of layers such as pooling and sum.Finally,this thesis uses the VCU118 evaluation board to build an FPGA-based neural network acceleration platform,and implements the design scheme.In order to achieve a good demonstration effect,this thesis further designed the corresponding host processor processing program,and implemented PCIe and QSFP interface logic inside the FPGA.Finally,the quantified Res Net18-SSD model has a prediction accuracy of 0.797 on the KITTI dataset,which is 3.1% lower than before quantization.With an input image size of 512×768×3,the FPGA processing latency is 84.23 ms,the frame rate is 11.87FPS(Frames Per Second),and the overall system throughput rate is 803 GOPS. |