| With the rapid development of mobile computing and Internet of Things(IoT)technologies,the number of IoT terminals accessing the Internet has grown exponentially.The centralized computing paradigm represented by cloud computing can hardly meet the demand of latencysensitive tasks in the face of massive user access,so edge computing has emerged.At the same time,artificial intelligence(AI)technology has also seen a boom and has been widely used in smart cities,smart factories,autonomous driving and other fields.The application of AI technology to edge computing can enrich the application scenarios of computing tasks in the edge environment and extend the service dimension of edge computing.Therefore,the integration of edge computing and AI has become an unstoppable trend.However,artificial intelligence models represented by Convolutional Neural Networks(CNNs)demand higher computational resources from devices while providing strong representation and generalization capabilities,and devices in edge environments are typically resourceconstrained.Therefore,performing convolutional neural network inference tasks on edge devices is limited by both the weak computing capabilities of the devices and the excessive computational requirements.To improve the inference speed of CNNs in edge environments,this paper investigates from two aspects:on the one hand,in response to the restriction of weak device computing capabilities,the computational method of the inference stage is optimized by parallelizing inference tasks,reducing the workload on a single device,and ultimately lowering the overall inference latency.On the other hand,to address the limitation of large computation of convolutional neural network,this paper starts from optimizing the model structure and pruning the convolutional neural network,so as to obtain a lightweight model more suitable for deployment at the edge.The specific research of this paper is as follows:(1)To address the challenge of limited computing power and slow execution of convolutional neural network inference tasks on edge devices,this paper proposes a parallel inference mechanism for edge clusters.The method reduces the workload on a single device by dividing the inference task into several small-sized subtasks and leverages the clustering effect of edge devices to improve the inference speed.Specifically,we propose a multi-fusion layer block parallelism strategy to reduce inter-device data interactions during parallel inference and thereby minimize additional communication overheads.Additionally,we present an adaptive inference workload allocation algorithm that can assign inference tasks to heterogeneous devices in dynamic network environments,optimizing resource utilization and reducing idle waiting time during parallelism.Experimental simulations were conducted on multiple convolutional neural networks and extensive comparisons were made with existing parallel inference methods.The experimental results demonstrate that our proposed method outperforms existing approaches and significantly reduces the total inference task latency.(2)To tackle the issue of complex structures and computationally intensive inference processes in convolutional neural networks,this paper presents an adaptive global pruning mechanism based on relevance scores,which utilizes structured pruning to achieve lightweight networks and inference acceleration.In order to accurately identify and eliminate redundant parameters within the network,we employ the relevance decomposition technique from the field of deep learning interpretability.This technique measures the contribution of parameters to output results,allowing us to identify low-contributing parameters and use them as a pruning criterion for filtering.The proposed adaptive global pruning algorithm can dynamically adjust the pruning rate at each iteration,avoiding suboptimal results that can arise from manual pruning rate selection.We conduct experiments on different convolutional neural networks,and the experimental results show that the proposed method can effectively reduce the redundancy of convolutional neural networks while minimizing damage to model accuracy.Additionally,the data collected in edge environments are limited by device performance and privacy protection,thus.the available data samples for training are usually constrained.Our approach achieves favorable results even in dataconstrained scenarios. |