Real-time Adaptive Deep Inference For Cloud-edge Collaboration

Posted on:2024-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:Z Kou

Full Text:PDF

GTID:2568307172496644

Subject:Computer Science and Technology

Abstract/Summary:

With the rapid development of artificial intelligence(AI)and intelligent terminal devices,implementing AI on intelligent terminal devices has become an important way for the wide application of AI technology.The fusion of AI and edge computing,known as edge intelligence,enables AI to extend gradually from the cloud center node to the edge closer to the data source and business site,forming a cloud-edge-end collaborative edge intelligence architecture.However,at present,there are still several issues in the research of model deep inference using cloud-edge collaboration,such as precision reduction caused by model compression,model partition that cannot adapt to dynamic changes in real-time,and uneven resource allocation in multi-user scenarios.To address these issues,this thesis conducts research on real-time adaptive model partition in dynamic networks and multi-user scenario.The main contributions are summarized as below:(1)Real-time adaptive partition collaborative inference framework(RAP)is proposed.RAP framework and its components are designed.RAP integrates model compression and model partition,enabling weighted joint optimization of the accuracy loss caused by model compression and the inference latency of model partition.RAP models the weights of loss and latency as a function of bandwidth,enabling adaptive optimization of model split points in dynamically changing networks,achieving adaptability in collaborative inference of model partition.Furthermore,RAP optimizes the decision space and proposes an optimization algorithm that enables fast decisionmaking in dynamic networks,reducing the decision and inference latency.This achieves real-time performance for collaborative inference of model partition.A detailed comparative experiment and result analysis were conducted in a real-world scenario.The results show that compared to several advanced algorithms,the proposed RAP framework can bring performance improvements of 2.98 x and 1.37% in inference latency and inference accuracy,respectively.(2)Joint multiuser model partition and resource allocation algorithm,JM-MPRA,is proposed.Considering the scenario of multiple user equipments and the resource allocation on edge server,this thesis integrates the JM-MPRA algorithm into the RAP framework to jointly optimize the allocation of computing resources,communication resources,and the decision of model split points for each user through iterative optimization.Furthermore,an initialization algorithm for the resource allocation vector,RAVI,is proposed to pre-schedule the computing and communication resources of edge servers,which reduces the number of iterations required by the JM-MPRA algorithm and improves the optimization solving speed.The experimental results show that the JMMPRA algorithm can reduce the inference latency by up to 2.54 x compared with other methods.

Keywords/Search Tags:

Edge Intelligence, Cloud-edge Collaborative Inference, Model Partition, Resource Allocation

Related items

1	Edge Intelligence：Research On Cloud-Edge-End DNN Collaborative Inference Acceleration Technology
2	Research On The Optimization Scheme Of Inference Stage Based On Model Partition Under Edge Intelligence
3	Research And Implementation Of Synergistic Inference Acceleration For Edge Intelligence Applications
4	Research On Cloud-edge Joint Task Inference And Model Collaborative Training In Edge Intelligence
5	Research On Resource Allocation And Task Scheduling For Edge Intelligence
6	Research On Edge-device Cooperative Inference Mechanism Based On Neural Network Model Simplification And Partition
7	Research On Cloud-edge Collaborative Task Offloading Methods For Artificial Intelligence Image Recognition
8	Dynamic Quick Partitioning Strategy Of DNN For End-edge Collaborative Inference
9	Research On Joint Resource Management Technology For Edge Intelligence
10	Research On Cloud-Edge-End Collaborative Application Offloading Mechanism