Font Size: a A A

Research On Resource Scheduling Of Deep Learning Tasks In TensorFlow Platform

Posted on:2019-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:J W YiFull Text:PDF
GTID:2348330563454001Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since AlphaGo played against humans,artificial intelligence has been greatly developed.Deep learning brings convenience in many aspects,such as the speech assistants in the field of speech recognition,face recognition in computer vision applications,and machine translation in natural language processing.The process of deep learning requires the construction of neural networks and the extraction of training data features.It draws on the characteristics of human brain structure and continuously trains large-scale data on multi-layer neural networks,resulting in higher levels of abstraction from multi-level training.The characteristics of the data facilitate the solution of complex problems,bringing great space for deep learning.Major computer giants have also opensourced their deep learning frameworks such as Caffe,Torch,and MXNet.Among them,Google's TensorFlow has advantages such as high availability,simple workflow,and active community support.It is not enough to rely on TensorFlow to perform the training of deep learning applications.The application process should take into account data storage and processing,resource management and scheduling,and application deployment.At the same time,the deep learning and training process requires a lot of computing resources,so it needs more reasonable scheduling.For this purpose,it is necessary to build a cloud deep learning platform with cloud computing related to technologies,realizing unified management of resources through virtualization technologies,and ensuring the normal operation of deep learning applications by integrating various frameworks.In order to improve the resource utilization rate of the deep learning task training process,this thesis starts from the following two aspects and expanded:(1)Improved particle swarm algorithm is applied to the mapping of virtual machines and physical machines under the cloud deep learning platform.As a heuristic algorithm,particle swarm optimization is suitable for solving combinatorial optimization problems.By adding the particle swarm diversity,parameter settings and other aspects,the convergence speed and solving precision of the algorithm are improved.According to the resource requirements of the TensorFlow deep learning task,especially the GPU requirements,a resource scheduling model for the purpose of improving the resource utilization is constructed,the improved particle swarm optimization(pso)algorithm is applied to the scheduling model,which is used to solve the placement of the virtual machine in the cloud environment problems.(2)Proposing resource scheduling strategy based on the running status of GPU devices when the GPU server performs deep learning training.At present,TensorFlow deep learning training is performed on the GPU server.The use of the main resource GPU relies on artificial and static resource allocation.In the case of multi-tasking and multiGPU,some GPUs are idle.To this end,benchmark tests were conducted,combined with TensorFlow's use of GPUs and acquisition and analysis of GPU device operating data,and a GPU resource scheduling strategy applied to TensorFlow deep learning tasks was proposed to increase GPU utilization and shorten the overall completion time of a series of tasks.On the CloudSim simulation platform,the scheduling algorithm of the application virtual machine placement was verified,which greatly improved the resource utilization rate.The scheduling strategy reduced the completion time of a series of tasks when training on the GPU server,and improved the GPU device utilization.
Keywords/Search Tags:Deep learning, Cloud computing, Scheduling of resources, Particle swarm optimization
PDF Full Text Request
Related items