Font Size: a A A

Design And Implementation Of Task Management System For Deep Learning Based On Kubernetes

Posted on:2021-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:X Y GuoFull Text:PDF
GTID:2518306047488224Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of emerging technologies such as artificial intelligence,a large number of enterprises and universities have increased investment in deep learning,and more and more non-professional users also wish to apply deep learning methods.To reduce the difficulty of applying deep learning for these non-professional users has became an increasing demand.In line with the goal of providing non-professional users with the convenience of applying deep learning technology,this thesis develops a deep learning task management system based on Kubernetes,which has functionalities including cluster computing resource management,scheduling,and monitoring,and deep learning model design to simplify user operations.(1)In terms of resource management and scheduling,this thesis extends the default scheduling algorithm of Kubernetes and implements a scheme to dynamically adjust the priority of user tasks by combining the priority of users in the cluster with their historical usage of the computing resources.The designing goal is to reasonably allocate computing resources for tasks of all users in the cluster,which solves the problem that some user tasks may not be scheduled because of too low priority of the user,which in results ensures the fairness of users in the cluster.Experiments show that the priority of user tasks can be dynamically adjusted according to the design with the historical duration of computing resources used by users.(2)In terms of cluster monitoring,this thesis uses Exporter,Prometheus,and Grafana to complete the implementation of the cluster monitoring system.Collect the resource consumption in the cluster and display the information of related resources(CPU,memory,network,etc.)in a visual form.Through cluster monitoring,you can find problems in the cluster and the bottleneck of resources in time.(3)In terms of model design,this thesis builds a task management system suitable for deep learning,which can construct deep learning models through a web interface.The front end of the platform needs to design the division of page functions,realize the drag and connection of the corresponding modules,real-time display of task training progress,visualization of data results,and so on.The back-end needs to cooperate with the data transmitted by the front-end to realize the corresponding functions,design the database structure,the representation of the graph model,the verification of the graph model,the translation of the graph model to the framework,and so on.Through the interaction of front-end and back-end functions,the system implements the functions of building,training,and visualizing test results of deep learning tasks,simplifying the way for non-professional users to build deep learning models.Finally,the application is encapsulated in Docker and deployed in a Kubernetes cluster in the form of Pod,which is managed and scheduled through Kubernetes.In order to confirm that the functions have been correctly implemented,we conduct a deployment test on the computing cluster.We first set up a Kubernetes cluster,and verify and analyze the improved scheduling strategy in this thesis.The results show that the modified scheduling strategy improves usability and can dynamically adjust the priority of user tasks according to the historical usage of computing resources.By monitoring the cluster,users can intuitively understand the resource usage of the cluster.Finally,the deep learning task management system is deployed in the cluster,and scheduling and management are performed through Kubernetes.Users can quickly build a deep learning model through this platform,which provides convenience for non-professionals to apply deep learning and improve production efficiency.
Keywords/Search Tags:Kubernetes, Docker, Deep Learning, Computing Resource, Resource Scheduling
PDF Full Text Request
Related items