Font Size: a A A

Research And Implementation Of Software And Hardware Acceleration For Lightweight YOLOv4-tiny Algorith

Posted on:2024-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhaoFull Text:PDF
GTID:2568306917975479Subject:Electronic Information (Electronics and Communication Engineering)
Abstract/Summary:PDF Full Text Request
At present,research on target detection algorithms is becoming mature.Many good algorithms and models have been deployed on central processing unit and graphics processing unit platforms,and their performance indicators are excellent.However,there are shortcomings of high power consumption,high cost,low real-time performance and poor migrability.To address the aforementioned issues,this paper proposes to deploy the model on embedded edge devices.The research content includes the following:To address the problems of redundant model parameters and large model size,which are not suitable for deployment in edge devices.A method that combines channel pruning and data quantization is proposed to optimize the YOLOv4-tiny model for lightweighting.Firstly,the model is sparsely trained to obtain the scaling factor to determine the importance of the model channels;secondly,the model is pruned to determine the importance of the channels by the scaling factor size,and the channels with smaller scaling factors are removed.The role of pruning is to remove the unimportant connections and channels from the model,thus reducing the model size and parameters.The third step uses Tensor RT to quantize the model.Quantization is to convert the type of data such as model weights from floating-point data to fixed-point data to meet the computing conditions of FPGA devices,and quantization can also further compress the model size.Through pruning as well as quantization to achieve the purpose of lightweight model,so that the model to meet the requirements on the board.Meanwhile,to meet the requirements of low cost,low power consumption and high real-time performance,this system is designed based on the Xilinx PYNQ framework.The convolutional module,which has a large amount of operations in the model,is designed for parallel acceleration optimization.The convolutional module generally occupies about 85% of the total network model computation,so the pipeline design method in hardware acceleration is used,and a parallelization improvement method for the three-stage addition tree and multiplication tree is proposed to accelerate the computation process of the convolutional module.Since each channel in the convolutional operation process remains independent and does not affect each other,optimizing each channel in a parallel computing manner can achieve hardware acceleration.Finally,the designed lightweight model was deployed on the PYNQ-Z1 heterogeneous processor platform,and the indicators of system resources,power consumption,and detection time were tested.With as little loss of accuracy as possible,the detection speed of a single image is better than that of the dual-core ARM-A9 platform,and the dynamic power consumption is significantly lower than that of the CPU and GPU platforms.With the support of the card resource,the system meets the real-time requirements,and the delay is much lower than that of the CPU platform,indicating that it has a certain practical feasibility.
Keywords/Search Tags:Deep learning, Object detection, YOLO, Hardware and software coperation, PYNQ
PDF Full Text Request
Related items