Font Size: a A A

Research And Implementation Of Execution Optimization System For Deep Learning Applications

Posted on:2021-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YangFull Text:PDF
GTID:2518306557487414Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence has achieved notable success.A technology called deep learning has been proposed,which makes use of deep neural networks to perform pattern recognition and data analysis.Deep neural networks are loosely modelled on the human brain and do well in complex tasks.Many applications such as face detection,machine translation and speech recognition are developed based on deep neural networks.The life cycle of deep learning applications is composed of two phases—training and inference.During training,neural networks learn from data and weights are updated,this process is compute-intensive and takes a long time to complete.Inference is a production phase in which trained models are deployed to predict real world data.The effectiveness of inference is measured by two metrics—accuracy and delay,and it's difficult to achieve high accuracy and low latency simultaneously.To optimize the execution of deep learning application,the following issues should be considered—how to accelerate model training and how to improve inference effectiveness.This thesis revolves around these two issues and main achievements include:Firstly,this thesis proposes a model-aware parallelization strategy for deep neural networks' distributed training,which consists of two steps.The first step is model-profiling,which estimates the size of parameters and output data for each layer using the formula summarized in the third chapter.The second step is strategy-making,which uses the model information collected in the previous step to analyze the time overhead and picks out the best strategy with particle swarm optimization.Secondly,this thesis proposes a task scheduling strategy oriented to heterogeneous requirements of inference task which consists of two steps.The first step is task offloading,in which tasks will be dispatched to a server in consideration of server load,server performance and task deadline.The second step is task scheduling,in which servers will decide for each task which model to use and determine how to order the tasks in consideration of the time sensitivity and accuracy sensitivity of tasks.Finally,this thesis designs and implements a prototype system in SEU Cloud platform,applying the theoretical research results into practice.Experimental results show that the modelaware parallelization strategy proposed in this thesis can reduce time overhead of training and the task-oriented scheduling strategy can improve success rate and accuracy of inference.
Keywords/Search Tags:deep learning, neural networks, distributed training, inference, task scheduling
PDF Full Text Request
Related items