Font Size: a A A

Online Placement And Scaling Of Geo-Distributed Machine Learning Jobs

Posted on:2021-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:X T LiFull Text:PDF
GTID:2518306194975819Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The parameter server architecture is widely used in Geo-Distributed Machine Learning to train global models.In order to process large geo-dispersed data collections efficiently,workers and parameter servers of each job are distributed in different data centers.However,when faced with large-scale training data and complex model parameters,geo-distributed machine learning jobs need to deploy a large number of workers and parameter servers,far exceeding the computing and storage capabilities of local small-scale clusters.Therefore,users need to rent cloud resources to training complex models.When the number of geo-distributed machine learning jobs submitted on the cloud platform increases,how to deploy all the jobs efficiently for long-term cost minimization becomes a key challenge.We study a cloud broker service that aggregates geo-distributed machine learning jobs into cloud data centers via dynamic online placement and scaling of these jobs.To realize cost minimization,we formulate this scaling problem as a mathematical optimization problem.Meanwhile,to decide the data migration strategy and the deployment of workers and parameter servers,we propose an efficient online algorithm which firstly decomposes the online problem into a series of one-shot optimization problems solvable at each individual time slot by the technique of regularization,and afterwards round the fractional decisions to the integer ones via a carefully-designed dependent rounding method.We prove a parameterized-constant competitive ratio for our online algorithm as the theoretical performance analysis,and also conduct extensive simulation studies to exhibit its close-to-offline-optimum practical performance in realistic settings,since our algorithm can save at least 20% of the cost compared with other algorithms.The online scheduling algorithm proposed in this paper provides a new direction for scheduling geo-distributed machine learning jobs,which can effectively reduce the training costs and improve resource utilization while satisfying users’ demand.
Keywords/Search Tags:Cloud Resource Scheduling, Geo-Distributed Machine Learning, Cost Minimization Problem, Online scaling and Scheduling, Mathematical Optimization
PDF Full Text Request
Related items