Font Size: a A A

Optimal Scheduling Of Machine Learning Tasks In Container Computational Environment

Posted on:2018-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:L J GongFull Text:PDF
GTID:2348330542988923Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In big data era,a lot of valuable information is waiting to be discovered and used,and it is very important for the decision makers to make a decision based on off-line analysis or real time analysis through machine learning.Machine learning in big data can not be separated from cloud computing,one of the core technologies of which is virtualization.In the past,a lot of applications and services are deployed directly on virtual machines,but the virtual machines need much time to start up,because each virtual machine has a guest operating system(OS).When a virtual machine starts,means it's OS will start.And every virtual machine has an OS running in it,which costs extra memory resource.It is because the Hypervisor virtualization technology has these problems such as the bad performance and the low utilization of resource that a new type of virtualization technology named container appears to solve these problems.Docker container's emergence is changing the way of people developing,testing and deploying applications.Compared to a virtual machine,a container has the following features:a container can be boot,created and destroyed very quickly;the utilization of the system resource by the container is high;the system overhead of the container is very small.Based on this,this paper studies the optimal scheduling of machine learning tasks in computational experiment environment of container.This paper will be driven by the limitations and realistic demands in the academic literature,On the basis of summarizing the experimental results and frontier methods in the fields of resource and task scheduling in container environment,this paper constructs computational experiment environment of container and performance monitoring platform.Then based on the computing environment of container,this paper implements relevant calculation experiment on machine learning tasks of classification,regression and clustering,and obtains the law of consumption of computing resources of different type of machine learning tasks using different models and different data sets.Besides,this paper build a model of the allocation and scheduling of containers on virtual machines aiming at minimizing the virtual machine leasing cost,time cost and idle cost of resource.The main work of this paper is as follows:First,this paper analyzes three cloud computing service modes,machine learning related algorithms and models,especially the machine learning classification model,regression model and clustering model,the existing machine learning algorithms at home and abroad as well as the related research results of optimal scheduling of computing resources.Second,a computational experiment environment of container and performance monitoring platform are built.The main function of the computational experiment environment of container and performance monitoring platform is to provide a computational environment for the classification,clustering and regression algorithm experiments of machine learning tasks.It also monitors the resource consumption(such as CPU and memory usage)of the machine learning task during execution.It consists of a physical machine(a real computer),a Linux virtual machine(a virtual computer)and Docker containers.We build a virtual machine on a physical machine and then build Docker containers on a virtual machine,after that we can deploy many containers.The performance monitoring platform consists of cAdvisor responsible for collecting each container's resource consuming data such as CPU and memory usage,InfluxDB responsible for data storage,and Grafana which can visualize the resource usage of a specific container in a certain period of time in the form of diagrams or tables.Third,based on the experimental environment and monitoring platform,16 computational service experiments of classification,regression and clustering tasks of machine learning are carried out.The data sets are from Spark and competitions.These computational experiments consist of machine learning task types,algorithms(models)and data sets.The experimental results show that:naive Bayes model and support vector machines model occupy less mean CPU when deal with the data set with relatively many samples but few attributes than the data set with relatively few samples but many attributes.The decision tree model is just the opposite.Compared with the K-means model,Bisecting K-means takes up more CPU on the same small data set.Fourth,this paper build a model of the allocation and scheduling of containers on virtual machines aiming at minimizing the virtual machine leasing cost,time cost and idle cost of resource.The model considers the constraint conditions that virtual machine must has sufficient resources to deploy the container,and that containers need to meet the resource requirements of machine learning tasks.And the model make decisions about container in which kind of configuration should be deployed and how they are allocated on virtual machines using least cost,then AIMMS optimization modeling software is used to solve the problem.In addition,this paper also get the solution by calling AIMMS through JAVA.
Keywords/Search Tags:cloud computing, virtual machines, container, machine learning
PDF Full Text Request
Related items