| In recent years,deep neural networks have made continuous breakthroughs in the fields of computer vision and natural language processing.With the advancement of algorithms,the demand for deploying neural network applications in cloud,terminal,edge and other scenarios has gradually increased.The existing deep learning models have high computational complexity and large number of parameters,which pose great challenges for deployment in edge environments such as mobile devices with strict requirements on hardware resources and power consumption.Deep model compression and acceleration technology can greatly reduce the amount of parameters and calculations without losing accuracy,reducing the difficulty of deep model deployment.Facing the edge environment,this paper conducts research from the perspectives of deep model pruning compression and model collaborative inference acceleration.The specific research contents are as follows:(1)An automatic structure pruning algorithm for deep models based on reinforcement learning.Aiming at the selection of pruning standards and pruning rates for each layer in the model pruning process,a filter pruning scheme with joint optimization of pruning standards and pruning rates is proposed.This paper fully considers the pruning sensitivity and the internal relationship between layers,re-establishes the optimization model of filter pruning,and minimizes the accuracy loss after model pruning on the basis of satisfying the target sparsity,using parametrized deep qnetworks algorithm(PDQN)to solve this mixed variable nonlinear optimization problem.The experimental results show that the proposed scheme selects the appropriate pruning standard and pruning rate for each layer under the given target sparsity,which reduces the accuracy loss after model pruning.(2)Pruning algorithm based on spatial and channel attention mechanism.Aiming at the problem of channel importance measurement in the pruning process,this paper proposes a channel importance measurement method based on attention mechanism.Inspired by the attention mechanism can help model pay more attention to important features.By introducing the Spatial Channel Attention(SCA)module on the convolutional layer,the attention score of the output channel can be obtained and deleted according to the attention score.Remove redundant channels.The algorithm combines the pruning process and network training,and introduces an attention module to complete the evaluation of channel importance with less overhead.Experimental results demonstrate that this scheme selects redundant channels according to the attention score and reduces the impact of pruning operations on model accuracy.(3)Research on acceleration of complexity-aware collaborative inference.Aiming at the problems of high latency and unstable communication bandwidth faced by deep model inference in edge environments,this paper proposes a complexity-aware collaborative inference scheme.By adjusting the exit threshold of each early exit branch in the progressive inference process and the model split point in the collaborative inference process,it can cope with the dynamic changes in the edge environment.And the reinforcement learning method is used to optimize the adjustment strategy of the exit threshold and the split point.The experimental results show that the scheme can well adapt to the changes of communication bandwidth and input data complexity,and meet the needs of different types of edge intelligence applications. |