Research On Distributed Inference Acceleration Technology For Convolution Neural Network

Posted on:2023-01-03

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Shen

Full Text:PDF

GTID:2558307097494914

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,applications based on Convolution Neural Network(CNN)have been widely applied in many fields.CNNs are computation-intensive,because they usually involve a large number of complex numerical calculations such as matrix multiplication and automatic differentiation.However,the computing capacity and resources of devices on the Internet of Things are limited.It will impose huge computational pressure on the devices and is difficult to meet the real-time requirements of users,if we run CNNs directly on these devices.Therefore,how to balance the limited computing resources of terminal devices and the huge resource demands of CNNs is one of the research hotpots in the edge intelligence field.At present,a popular solution is to use model partitioning to divide CNNs,then complete collaborative inference between multiple devices jointly.However,the increasingly complex network structure and dynamic environment bring many challenges to collaborative inference.To solve these challenges,this paper mainly research model partitioning algorithm and collaborative inference framework in depth.The detailed research contents are summarized as follows:In order to improve the CNN inference speed on resource-limited devices,this paper proposes an on-demand fine-granularity partitioning method(OFPM).First of all,we propose a partitioning method(FPM)for homogeneous environments which combines horizontal and vertical partitioning.FPM can divide the convolution layer into multiple convolution layer partitions by horizontal partitioning,and determine the optimal execution device for each layer by vertical partitioning,to minimize the overall inference latency.Secondly,considering the heterogeneity of devices and the dynamic nature of networks,we design an on-demand partitioning strategy(OPM)for heterogeneous environments,then combine OPM and FPM to form OFPM.OFPM can automatically adjust the height of input feature map slice of each convolution layer partition according to devices capabilities and networks conditions,to average the computation on each device as much as possible,then to further improve CNN inference speed.In order to coordinate devices in heterogeneous edge environments and obtain a good end-to-end performance,a layer latency prediction model(FCPM)based on floating-point operations and CPU load and a distributed collaborative inference framework that supports fine-grained partitioning(DCFP)are proposed in this paper.DCFP includes three phases: offline training,online optimization and collaborative inference.In the offline training phase,one set of FCPM are trained for each device.In the online optimization phase,model partitioning is performed according to the layer latency predicted by FCPM and network bandwidth to generate partition decisions that adapted to the devices’ computing capabilities and network conditions.In the collaborative inference phase,all heterogeneous devices are coordinated according to the partition decisions to complete collaborative inference.Finally,this paper uses common regression algorithms to verify the effectiveness of FCPM,and simulates homogeneous and heterogeneous environments to conduct a large number of experiments for FPM and OFPM,respectively.Experimental results show that FCPM can achieve a minimum accuracy of 88%,and OFPM can improve the inference speed by 1～2.54 times compared with the advanced partitioning methods.

Keywords/Search Tags:

Edge computing, Deep neural networks, Edge intelligence, Collaborative inference, Model partitioning

PDF Full Text Request

Related items

1	Research On Edge And Cloud Collaborative Computing Model And Algorithm Based On Deep Neural Network
2	Edge Intelligence：Research On Cloud-Edge-End DNN Collaborative Inference Acceleration Technology
3	Research On Distributed Inference Mechanism Of Deep Neural Network Model Based On Edge Intelligence
4	Research On Collaborative Inference Optimization Technology For DNN In Edge Computing
5	Research On Incremental Computing Offloading For Edge Intelligence
6	Dynamic Quick Partitioning Strategy Of DNN For End-edge Collaborative Inference
7	Real-time Adaptive Deep Inference For Cloud-edge Collaboration
8	Research On Cloud-edge Joint Task Inference And Model Collaborative Training In Edge Intelligence
9	Research On Resource Allocation Method Of Mobile Edge Computing Based On Intelligent Inference
10	Research On Deep Convolutional Neural Network Training And Inference Optimization For Edge Intelligence