Font Size: a A A

Software-hardware Co-design Of CNN Adaptive Dynamic Reconfiguration System Based On FPGA

Posted on:2024-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LuoFull Text:PDF
GTID:2568306941488574Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of rapid development of technology and information,deep learning technologies represented by artificial intelligence(AI)are playing an increasingly important role in various fields.Among them,Convolutional Neural Networks(CNNs)have been widely used in image recognition,object detection and other fields,achieving remarkable results.However,at the same time,in the current multi-scenario,multi-load application environment,there still exist problems such as lack of flexibility and adaptive adjustment ability in the acceleration circuit,and low acceleration efficiency when running partial networks.Therefore,this thesis designed an adaptive reconfiguration platform based on Xilinx DPU for the adaptive deployment of CNN inference systems under dynamic loads.To solve the problem of energy efficiency ratio and resource utilization rate reduction of the system under partial CNNs,a hardware architecture compatible with standard convolution,depthwise separable convolution,and deconvolution is proposed.The main research contents of the thesis are as follows:1.This thesis elaborates on the composition structure and computational principles of Convolutional Neural Networks(CNN),with a focus on analyzing the parallel efficiency problems of depthwise separable convolution and deconvolution in hardware acceleration.In high parallel architectures,the resource utilization of depthwise separable convolution is far lower than that of standard convolution due to its lower calculation intensity and lack of output channel parallelism.In the conventional calculation mode of deconvolution,excessive zero elements reduce its acceleration efficiency in the circuit.This thesis also introduces the features and reconfiguration characteristics of the DPU platform and analyzes the performance of different network models2.Combining the configurable features of the DPU universal acceleration unit,a flexible adaptive reconfiguration scheme is proposed to meet the needs of various scenarios,and feasibility analysis and experimental verification are performed.Based on the roofline model of the DPU,the performance difference between standard convolution and depthwise separable convolution is analyzed.A regression model is trained on a dataset of single-layer CNNs to predict different CNN runtime.Based on the predicted model,a reconfiguration scheme is designed to optimize system energy consumption and resources.The max and compact modes are configured to reduce the resource occupation of non-common operators.The adaptive platform is built on the ZCU102 platform,and the prediction accuracy of the model reaches 90.7%.The experimental results in the ADAS field show that the adaptive reconfiguration system has a significant improvement in energy consumption and resource utilization compared to the baseline scheme and the reconfiguration scheme.3.This thesis proposes a data flow and hardware architecture that is compatible with both forward convolution and deconvolution,aimed at optimizing the low parallel efficiency issue in depthwise separable convolution and reducing the drop in energy efficiency and resource utilization when switching between standard convolution networks and lightweight networks.The accelerator’s computation data flow is optimized from the perspective of data quantization,inter-layer fusion,and conversion of deconvolution calculation mode.Additionally,the thesis proposes a calculation unit that is parallelized across elements and input channels,and a block-oriented input/output data mapping scheme that reduces the number of memory accesses and data transfer costs.The proposed accelerator’s performance is evaluated on the VGG16 and MobileNet convolution layers,and the optimized architecture achieves 1.64× and 1.14× improvements over DPU in terms of resource utilization and energy efficiency at a frequency of 200MHz on the Xilinx ZCU102 platform.
Keywords/Search Tags:adaptive reconfiguration, depthwise separable convolution, deconvolution, parallel acceleration
PDF Full Text Request
Related items