Font Size: a A A

Accelerated Design And Implementation Of SSD Algorithm Based On FPGA

Posted on:2020-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:D H QinFull Text:PDF
GTID:2428330623455905Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous improvement of computer performance and the deepening of academic research in the field of machine learning,convolutional neural network(CNN)has become one of the most popular machine learning algorithms in recent years,in video surveillance,machine vision,pattern recognition,image search and other fields have been widely used.The current convolutional neural network algorithm is mainly implemented by the GPU platform,but the GPU has a problem of excessive energy consumption and is difficult to apply to the embedded system.ASIC-based convolutional neural network accelerators have long development cycles,large cost investments,and lack of flexibility.Most existing embedded systems are based on a single platform such as ARM and FPGA.The use of ARM can quickly and easily build an embedded system.Due to the specific calculation method in the convolutional neural network,the operation efficiency on the ARM is low,and it is difficult to achieve satisfactory performance.FPGA has powerful programmability,low-latency design and low-power characteristics.The FPGA-based convolutional neural network accelerator has gradually attracted people's attention and has become an important field in the research of hardware implementation of deep learning algorithms.However,there are still many challenges in deploying convolutional neural network algorithms in FPGA.With the development and improvement of algorithm theory,the number of new algorithm network layers has been deepened in recent years,and the network layer is complicated and diversified.It is developed through traditional HDL language.FPGAs have problems of high development difficulty,long development cycle,and poor portability.This paper adopts the design of ARM+FPGA to carry out software and hardware collaborative development of SSD algorithm based on convolutional neural network.The convolution part of the SSD algorithm involves a total of 68.22 G times of multiply-accumulate operations,and the number of weighting factors is about 27.44 M.It is a computationally intensive and storage intensive algorithm.At the algorithm level,the hardware adaptability of some network layers of the SSD algorithm is optimized.The optimized algorithm is trained using the PASCAL VOC 2007 and VOC 2012 data sets.The retrained algorithm is almost identical to the original SSD algorithm.During the design process,reasonable software and hardware partitioning is performed on the algorithm,and hardware-accelerated parts with high computational complexity and high time-consuming ratio are accelerated.The common characteristics of the convolutional neural network algorithm are analyzed.The general architecture of the accelerator is designed on the PL side.The method of dimension splitting and partitioned data stream management is adopted for the convolutional layer,so that the accelerator can realize convolution calculation of any size.The high-level sythesis(HLS)development method is used to parallelize the levels of the general convolution kernel.The PL-side accelerator efficiently performs network layer operations such as convolution,pooling,and activation.The specific module of the algorithm is implemented at the PS-side.The general-purpose network layer stacking of the algorithm is realized by calling the PL-side accelerator.The high-speed information exchange between ARM and FPGA is completed by using the SDSoC platform integrated design,and the rapid establishment of the SSD algorithm network is realized.Finally,the optimized SSD algorithm is implemented on the Xilinx ZCU102 development board.When the data type is single-precision floating point at 200 MHz,the single frame detection speed is 1.57 s.The software and hardware collaborative design scheme and ARM-based software design proposed in this paper.In comparison,it can achieve a 110 x speedup and save 101 x power.If the algorithm uses the fixed-point and pruning compression methods,the real-time processing of the video stream of the SSD algorithm can be realized in the embedded field.
Keywords/Search Tags:Convolutional Neural Network, Hardware Acceleration, ZYNQ, High-Level Synthesis, SSD Algorithm
PDF Full Text Request
Related items