Font Size: a A A

Research On Collaborative Inference Optimization Technology For DNN In Edge Computing

Posted on:2023-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:W Y XuFull Text:PDF
GTID:2558306845499464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the Internet of Things(Io T)has generated a massive amount of data,that cloud computing is unable to meet the demand of real-time Io T applications.Therefore,edge computing has been widely used.In addition,the Io T data gradually diversified,and the artificial intelligence(AI)technology was widely applied.Both of them promote the requirement to use AI to process Io T data in edge computing as increasingly vigorous.However,Deep Neural Network(DNN),as the critical backbone technology of AI,is difficult to deploy in resource-constrained edge devices due to high inference cost.For solving this issue,various DNN inference optimization techniques have been proposed,including single device and collaborative two categories according to the different participating inference devices.The single device techniques mainly consider compression or adjustment of model weights,but the inference speedup is limited by the computational resources of a single device,and DNN extreme compression also leads to significant performance loss.So this thesis focuses on collaborative inference optimization techniques.Collaborative inference mainly consists of two typical scenarios: multi-edge collaboration and edge-cloud collaboration.On the one hand,multi-edge collaboration is restricted by resource heterogeneity of edge devices,how to reduce DNN inference cost to ensure more devices can participate in collaborative inference,moreover,how to partition workload to balance the execution time of each device,are two key problems that hinder efficient collaboration inference.On the other hand,when multiple edge devices compete for limited resources of each edge server in edge-cloud collaboration,how to formulate the optimal execution strategy for each device to guarantee the task completion rate is the major challenge.To address the above problems and challenges,this thesis explores and improve collaboration optimization technology.The main contributions are listed as follows:Firstly,this thesis presents an EdgeDistributed Inference(EdgeDI)framework,that takes both DNN model complexity and workload partitioning fairness into consideration in multi-edge collaboration.To improve the performance,EdgeDI exploits two key optimization knobs,including:(1)Model transformation scheme consisting of a lossless pruning method and a Squeeze-Convolution-Attention-Restore(SCAR)convolutional replacement block,which transforms the collaboration model into a compact model with lower inference resource cost;(2)Distributed inference based on Optimal OneDimensional Partition(OODP)algorithm,which adaptively balances the workload distribution among collaboration devices to obtain similar execution times under heterogeneous resource conditions.The experimental results indicate that the model transformation scheme can significantly reduce model parameters and memory consumption without loss of model performance,the OODP algorithm has a higher inference speedup ratio than other mainstream solutions,and benefit from the excellent performance of both schemes,EdgeDI can eventually approach or even exceed the theoretical bounds of parallel computing.Secondly,this thesis proposes a Greedy and Dynamic Programming(GRDP)framework,which formulates optimal collaborative policies for different constraint edgecloud collaboration tasks,when multiple edge devices compete for server resources.GRDP contains two collaborative scheduling algorithms for Single-branch(GRDP-S)and Multi-branch(GRDP-M)models respectively.GRDP-S starts by setting the optimal partition point selection criterion as the minimum serve execution time,then combines greedy and dynamic programming methods to determine the best partition point on the original collaborative model for each device.When the collaborative model allows minor modification and retraining conditions are available,GRDP-M introduces the early exit scheme based on the multi-branch model.To improve collaborative scheduling algorithm performance,GRDP-M designs a Multi-Jensen-Shannon(MJS)self-distillation training method to train the multi-branch model.The experimental results show that GRDP-S can efficiently guarantee the task completion rate compared with other solutions.Due to the multi-branch model mitigating competition of computational resources through task local execution,and bringing more collaborative partition points,GRDP-M further improves the task completion rate.
Keywords/Search Tags:Edge Computing, Deep Neural Network, Multi-edge Collaboration, Edge-cloud Collaboration, Model Compression, Knowledge Distillation
PDF Full Text Request
Related items