| In recent years,the concept of neural network has been widely mentioned and applied in many fields.Among them,convolutional neural network is influencing significant breakthroughs in the field of computer vision by achieving unprecedented accuracy in image classification,object detection and semantic segmentation.Object detection task is an important branch of computer vision task.Due to the high computational complexity of most object detection algorithms based on neural network,graphics processing unit(GPU)is also used to realize the calculation of object detection algorithm although the cost is high.Considering power consumption and other aspects,GPU is hardly suitable for the task of object detection algorithm deployed at the edge.Therefore,hardware accelerators based on Field Programmable logic Array(FPGA)have attracted wide attention because they show GPU-like performance and significantly lower power consumption,which is suitable for edge devices.In this paper,we have implemented a hardware and software collaborative system to realize the deployment of convolutional neural network algorithm object detection algorithm at the edge side.The system has good real-time performance and excellent power performance,and different from existing FPGA accelerators,it can flexibly deploy different CNN object detection models.By analyzing the data flow and workflow of the object detection system,the architecture of the system is proposed.According to the characteristics of CPU and FPGA,the functions of hardware and software in the system are divided.The task of data flow transmission and control is powerd by CPU subsystem,and FPGA subsystem designs a hardware accelerator for general matrix multiplication accelerat to finish the two core calculations of CNN which is convolution and full connection.The CPU subsystem and FPGA subsystem are connected by an efficient on-chip AXI bus.Because the calculation module in the FPGA subsystem of this topic realizes the general matrix multiplication and can set different number of calculation modules according to the requirements in the chip,the system has the characteristics of strong computing capability and high flexibility.FPGA subsystem is the core computing unit of the object detection system proposed in this paper,and it is the key to affect the calculation speed of the system.In this paper,multi-stage pipeline technology and parallel computing technology are used to improve the parallelism of the working speed of the system to achieve the purpose of high-speed data processing.We use multistage pipeline technology to improve the parallelism between the CNN layer by improveing the parallelism between CPU and FPGA.The pipeline between FPGA computing modules is used to improve the parallelism between CNN layers,but also to improve the parallelism of the same layer through cascade.FPGA calculation module is also realized by pipeline technology.The global memory pool uses the technology of data pre-access FPGA computing and data transfer work in parallel.In this paper,a lightweight object detection algorithm is optimized and trained according to a practical application.In the introduction of system test,the correctness of system function is tested by using this lightweight object detection algorithm.In order to verify the practicability of the system,a variety of convolutional neural network algorithms are deployed in the object detection system,and its real-time performance is counted and compared with existing systems on the market.The object detection system designed in this paper has better power consumption performance under the condition that the real-time performance is equivalent to the existing system.Based on the test results of the system,the proposed object detection system can adapt to different types of convolutional neural network models and has the characteristics of high real-time performance under the power first condition. |