| The Convolutional Neural Network(CNN)has achieved extensive development and has been widely used from cloud computing to edge computing scenarios.However,due to the high time and space complexity of CNN,CNN based on general processor architecture is difficult to meet the requirement of inference time.CNN hardware acceleration,as one of the research to solve the above problems,improves throughput by parallel computing at the hardware level,thereby reducing inference time.Due to the fact that this method can obtain a dedicated processor architecture that is suitable for computing and memory features,this research has become a hot topic among researchers in recent years.However,in edge and ultra-edge computing scenarios,strict constraints on factors,such as power,size and weight,limit the available resources for CNN.As a result,current CNN hardware acceleration methods face challenges in requirement of speed and power under these resource-constrained conditions.To address these problems,this paper proposes solution from three aspects based on the analysis of current CNN hardware acceleration methods: hardware operator,accelerator architecture and model collaborative design methods.The effectiveness and superiority of the proposed methods are also validated through board-level experiment and application validation.The main research contents are summarized as follows:(1)For the problem of throughput reduction caused by inefficient hardware operator under resource-constrained conditions, this paper proposes a CNN acceleration operator optimization method that incorporates multiple parameters.This method aims to improve operator efficiency by optimizing computing efficiency,memory efficiency and resource utilization.Firstly,the CNN acceleration operator structure based on a fully pipelined operator node is optimized with sharing activation functions to improve the efficiency of non-linear function computation.Meanwhile,a resource utilization theory based on shared factor parameters is constructed for computing efficiency evaluation.Secondly,based on the analysis of convolutional local computing characteristic,row buffer and hybrid memory type are employed to optimize the memory efficiency of feature map and weight.The corresponding resource theory based on row buffer and depth threshold parameters is constructed for memory efficiency evaluation.Finally,a multi-parameter design space exploration and optimization method are proposed to improve throughput with the optimization goal of maximizing resource utilization.Experimental results show that the proposed method can improve operator computing efficiency,memory efficiency and resource utilization.Compared with existing research,this method effectively increases the throughput of CNN hardware acceleration.(2)Under the resource constraints,in order to solve the problem of throughput decline where the single-mode computing engine architecture is unbalanced in the computing intensity of network layer of CNN detection model,this paper proposes an optimization method for accelerator architecture of CNN detection model based on a hybrid computing engine.Based on the analysis of computational features of each layer in CNN detection model,the method aims to match the computational intensity of each stage to improve throughput.Firstly,a streaming engine with intra-layer mapping optimization is proposed to address the differential computational intensity features of backbone network.Secondly,a single computing engine with inter-layer fusion optimization is proposed to address the similar computational intensity features of branch network.Thirdly,a post-processing engine with probability threshold optimization is proposed to address the sparse computational intensity features of post-processing network.Finally,based on the hybrid computing engine,a theoretical resource model for the fully mapped accelerator architecture is constructed.Meanwhile,a method for exploring the design space of engine parameters under resource constraints is proposed with the goal of matching the inference time of each stage(including backbone network,branch network and post-processing network).Experimental results show that the optimized accelerator architecture can allocate computing resources and match inference time according to the computational intensity of each stage,thereby effectively improving the throughput of CNN detection model.(3)To address the problem of imbalanced network inference speed and accuracy during parameter optimization in resource-constrained conditions,this paper proposes a speed-accuracy co-optimization method for CNN detection model.This method aims to build a speed-accuracy relationship function for model optimization.Firstly,based on the computational intensity features of hybrid computing engine,layer equalization,operation regularization and channel sparsification construction strategies are proposed for the backbone,branch,and post-processing networks.The model inference speed is initialized based on this strategy and it can also be evaluated by resource utilization.Secondly,based on the correlation between model accuracy and model size,the channel factor reflecting the model size is used as the representation of model accuracy,and the speed-accuracy co-optimization model is constructed based on resource utilization and channel factor.Finally,a parameter optimization method of sequential interpolation approximation is proposed with computing and memory resource constraints.Experimental results show that this method can expand the optimization boundary of detection model and accelerator architecture parameters in a bi-directional manner,and effectively improve the balance between speed and accuracy of detection model.Based on the above research work,application validation is carried out for the application of nano Unmanned Aerial Vehicle(nano-UAV)autonomous human target detection.The validation results show that under resource-constrained conditions,after the CNN hardware acceleration is completed by the FPGA-based nano-UAV intelligent computing unit,the intelligent computing unit of nano-UAV can meet the application requirements in terms of system power,detection speed and accuracy.In summary,the research content of this paper has important theoretical significance for enriching CNN hardware acceleration methods,and has important application value for expanding the range of CNN in edge computing and ultra-edge computing scenarios.The related modeling ideas and optimization methods can also provide reference for hardware acceleration research in other fields. |