In recent years,deep learning accelerators and deep learning models have become increasingly complex,and the computing power required for artificial intelligence application scenarios is also rapidly increasing.This puts higher requirements on smart software and hardware.At the same time,due to the chip’s card neck and edge computing power Most of the software and hardware technology giants and start-up companies have developed their own dedicated acceleration hardware,but if you want to further improve hardware performance utilization,improve the ease of use of smart chips to expand the market scale,they cannot do without chip software The development of the stack and the hardware architecture determine the highest computing power,and the actual performance is still determined by the higher-level compilation framework.In addition,a single hardware architecture cannot meet the computing requirements of operators required by complex application scenarios,and manual optimization is also more difficult.To this end,this paper studies the compilation technology of heterogeneous platforms,and aims to design a set of deep learning compilation optimization stacks to form a complete artificial intelligence software and hardware ecosystem to support the optimal deployment of models in different application scenarios.The main research work and contributions of this article include the following:(1)Designed and implemented a set of model end-to-end optimization stacks supporting heterogeneous platforms composed of dedicated convolutional neural network accelerators.(2)A new standard channel pruning algorithm is proposed,and the intermediate representation designed in this paper is adapted to reduce the engineering complexity of the compression algorithm in different deep learning frameworks.(3)Through the subgraph splitting technology,the operators/subgraphs not supported by the accelerator are unloaded and converted to the intermediate representation based on the TVM deep learning compiler for joint optimization.(4)Designed and implemented a visual interface front end,which greatly reduces the threshold for using the compiler.The experimental results show that the compiler(Tiangong Neural Network Compiler)designed in the article can optimize the model according to hardware characteristics and generate executable code in a heterogeneous system,which greatly improves the efficiency and effect of smart chip application development.And in the task of ship target detection,on general equipment,the speed is increased to 1.3 times when the accuracy loss does not exceed 1%,and the speed is increased to 1.6 times on the dedicated convolutional neural network accelerator.It can be deployed in deployment.Effectively accelerate the convolutional neural network. |