| The rapid development of deep learning target detection algorithms,and the foreseeable end of Moore’s Law will lead to a slowdown in the processor process.Therefore,to fully exploit the computing power of existing intelligent computing platforms has become a realistic choice.Based on the heterogeneous intelligent computing platform composed of CPU and Cambricon MLU270,this dissertation proposes a flexible deployment optimization acceleration strategy for YOLOv5 target detection algorithm with characteristics of efficient inference and low m AP(mean Average Precision)loss.By fully analyzing the characteristics of the YOLOv5 algorithm and the above software-hardware platform,YOLOv5 algorithm is first adapted to the platform by operator concatenation and realization.Further,a two-level acceleration strategy at the algorithm level and the system level is further carried out.At the algorithm level,the network structure is adjusted by replacing its FOCUS module,and so the platform characteristics can be more effectively exploited;the network slimming method is implemented to reduce the amount of weight parameters;a channelby-channel INT8 fixed-point quantization is adopted to further reduce the parameter bit width.At the system level,task balancing is utilized to maximize MLU utilization rate;two time-consuming operators are accelerated at Bang C level.Further,TFU fusion,graph optimization,and address optimization are simultaneously performed at the stage of offline model generation.Besides,The CPU/MLU three-stage pipeline design is designed and utilized at the stage of offline inference.We actually evaluate the effect of the above strategies.The inference speed of the optimized YOLOv5 s on the Microsoft COCO 2017 dataset has reached 736FPS(Frame Per Second),which is more than 70 times higher than the baseline without optimizations.The m AP loss is less than 1%,and the inference speed on the private dataset even reaches853 FPS.Last but not least,an efficient YOLOv5 offline model was successfully deployed and applied to an aerial reconnaissance images recognition system,and the experimental results show that for super large aerial images of size being 7500*50000,the inference speed reaches 0.25 FPS when m AP is 0.883,which laid a solid technical foundation for the application of special scenarios. |