Font Size: a A A

Hardware Accelerated SoC Design For Object Detection Based On RISC-V CPU

Posted on:2022-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZhangFull Text:PDF
GTID:2518306761952969Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In recent years,rapid development object detection algorithm based on depth of learning,traditional CPU architecture cannot meet the increasing amount of computation and data requirements,so the convolution neural network hardware accelerators became the research hot spot,widely used in the Io T and embedded,etc,for these areas,high performance and low power consumption of the system is the primary goal of design The GPU architecture has high speed but high power consumption,so CPU+FPGA heterogeneity becomes the preferred architecture.It not only has the characteristics of high performance and low power consumption,but also has the flexibility of FPGA,enabling the accelerator to be improved synchronously with the optimization of the algorithm.Therefore,this paper designs a object detection hardware acceleration system based on RISC-V CPU+FPGA heterogeneous architecture.The target acceleration algorithm is a new lightweight object detection algorithm – Yolov3-tiny.Currently,processors in the embedded field are still dominated by ARM,but its patent barrier makes developers and designers shy away.Although the software ecosystem of RISC-V is not perfect,and there are few system design materials based on RISC-V,in order to design a system with independent intellectual property rights and not restricted by others,In this design,a completely open source Rocket Core processor based on RISC-V instruction set is used as the control end of heterogeneous architecture.RISC-V not only has the advantage of open source and free,but also simplifies the instruction set to design a processor with lower power consumption,smaller area and higher performance than ARM processor,which is more suitable for application in the embedded field.The design content of this paper includes the configuration and compilation of the processor,the hardware design of the image acquisition system,the transplantation of the operating system based on RISC-V,the analysis of Yolov3-tiny algorithm,the modeling and design of the object detection hardware accelerated IP.In order to improve the system function,this paper transplanted Linux operating system to the whole So C,independently wrote the RISC-V processor startup file,link file,for the system JPEG decoding,object detection hardware acceleration IP,VDMA,Ethernet and other modules Linux software driver.The functions of FTP and camera image acquisition are realized,and verified on FPGA,and two image data acquisition methods of camera and FTP are realized.In hardware-accelerated IP modeling,the six-fold loop decomposition of convolution layer is optimized into eight-fold loop,which is convenient for hardware mapping and weight data reusing.A five-level pipeline convolution operation path is designed,and ping-pong cache is used to improve the hardware utilization of the system.The layout of features and weights in external memory and on-chip cache is designed to simplify the read and write back logic in the two DMA modules,and the bandwidth bottleneck problem in Roofline model is solved by using the data reuse of weights.The two-dimensional operation of the pooling layer is optimized to the one-dimensional operation step by step so that the algorithm can be mapped to the hardware more easily.The biggest advantage of the design in this paper is that the hardware can be flexibly configured through parameters.Based on different platform resources,the multiplicator array,on-chip cache and bus bandwidth can be flexibly changed to adapt to different verification platforms and application scenarios.In this paper,behavior-level simulation of hardware acceleration IP with different parameter configurations is carried out on Vivado,two functional simulation modules are written to simulate CPU and external memory,and interact with hardware accelerator.For the parameter configuration of 32×16 multiply array,the calculation speed of hardware-accelerated IP pair Yolov3-tiny reaches 35ms/img.Finally,this paper verifies the whole object detection So C on Zedboard platform.Due to the limitation of DSP resources on FPGA board,the configuration parameter of multiply array is 16×8.The detection accuracy of COCO dataset m AP and VOC dataset m AP reached 32.6% and 58.4% respectively,higher than that of other heterogeneous accelerator designs.The detection speed is 4.6 times that of a 20-core server,777.9times that of a single-core RISC-V CPU,and the energy efficiency ratio is 7.46GOPS/W,248.7 times that of a 20-core server,373.2 times that of a RISC-V CPU,and higher than NVDIA's GTX 1660 Ti GPU.The system achieves high performance,low power consumption and high precision design goals.
Keywords/Search Tags:RISC-V CPU, Object detection, Hardware acceleration system, Yolov3-tiny, FPGA
PDF Full Text Request
Related items