| With the rapid development of communication technology and information industry,the public demand for mobile applications is increasing,and the traditional resourcecentralized cloud computing is gradually insufficient to meet the demand of various mobile applications with strict latency constraints.Therefore,Multi-access Edge Computing(MEC)supplements cloud computing by migrating network traffic and computation from the cloud to the edge of the network.Meanwhile,intelligent applications based on Convolutional Neural Network(CNN)are playing an important role in more and more edge computing scenarios.However,there are still more problems in running CNN inference in MEC environments.On the one hand,the CNN inference speed is limited by the resource of edge server,which makes it difficult to meet the real-time requirement of task execution; on the other hand,using single edge server to complete CNN inference leads to the poor resource utilization of the MEC environment.Therefore,how to achieve fast inference of CNN models in MEC environment while ensuring load balancing among edge servers is an urgent problem to be solved.The main work of this paper is as follows.We first propose a multi-edge assisted CNN inference acceleration technique based on group convolution.In this study,we first discuss the structural characteristics of common CNN models,and analyze the variation patterns of the performance metrics in the models and the reasons for them.We accordingly propose a novel CNN model evaluation metric to determine when to enter parallel inference to maximize the parallel inference efficiency.In order to obtain the optimal CNN parallel inference subtask division and offloading scheme,we propose a nonlinear programming problem with the objective of minimizing the average execution delay of subtasks,and design a heuristic algorithm to solve this optimization problem.This technique requires multiple edge edge servers to execute CNN inference synchronously and has a large communication overhead,so it is more suitable for homogeneous edge networks with stable network connections and consistent edge server resources.We further propose a multi-edge assisted CNN inference acceleration technique based on convolution decomposition.In this study,we first propose a parallelization scheme that is more suitable for heterogeneous edge networks to minimize the communication overhead when edge servers collaborate.Then we model the CNN parallel inference subtask generation and offloading problem in the accelerated system as a constrained nonlinear programming problem with the objective of minimizing the average subtask execution delay and execution delay variance.Finally,we transform the problem so as to solve it with Ada Boost and Random Forest algorithms.Because the communication overhead in this technique is small and the subtasks can be generated adaptively according to the edge server resources,it can be used in heterogeneous edge networks with more complex device heterogeneity and dynamic runtime.Finally,we conduct intensive experiments to evaluate the performance of the proposed CNN inference acceleration system.The experimental results show that the system can achieve collaborative CNN inference among multiple edge servers with an average communication overhead of only 3MB.In various CNN applications such as object detection and instance segmentation,the system achieves an average inference speedup of 2.76 times while the inference accuracy decreases by less than 5%. |