| With the rapid development of artificial intelligence(AI)and intelligent terminal devices,implementing AI on intelligent terminal devices has become an important way for the wide application of AI technology.The fusion of AI and edge computing,known as edge intelligence,enables AI to extend gradually from the cloud center node to the edge closer to the data source and business site,forming a cloud-edge-end collaborative edge intelligence architecture.However,at present,there are still several issues in the research of model deep inference using cloud-edge collaboration,such as precision reduction caused by model compression,model partition that cannot adapt to dynamic changes in real-time,and uneven resource allocation in multi-user scenarios.To address these issues,this thesis conducts research on real-time adaptive model partition in dynamic networks and multi-user scenario.The main contributions are summarized as below:(1)Real-time adaptive partition collaborative inference framework(RAP)is proposed.RAP framework and its components are designed.RAP integrates model compression and model partition,enabling weighted joint optimization of the accuracy loss caused by model compression and the inference latency of model partition.RAP models the weights of loss and latency as a function of bandwidth,enabling adaptive optimization of model split points in dynamically changing networks,achieving adaptability in collaborative inference of model partition.Furthermore,RAP optimizes the decision space and proposes an optimization algorithm that enables fast decisionmaking in dynamic networks,reducing the decision and inference latency.This achieves real-time performance for collaborative inference of model partition.A detailed comparative experiment and result analysis were conducted in a real-world scenario.The results show that compared to several advanced algorithms,the proposed RAP framework can bring performance improvements of 2.98 x and 1.37% in inference latency and inference accuracy,respectively.(2)Joint multiuser model partition and resource allocation algorithm,JM-MPRA,is proposed.Considering the scenario of multiple user equipments and the resource allocation on edge server,this thesis integrates the JM-MPRA algorithm into the RAP framework to jointly optimize the allocation of computing resources,communication resources,and the decision of model split points for each user through iterative optimization.Furthermore,an initialization algorithm for the resource allocation vector,RAVI,is proposed to pre-schedule the computing and communication resources of edge servers,which reduces the number of iterations required by the JM-MPRA algorithm and improves the optimization solving speed.The experimental results show that the JMMPRA algorithm can reduce the inference latency by up to 2.54 x compared with other methods. |