Font Size: a A A

Research On Distributed Data Scheduling Algorithm For User Tasks Model In Cloud

Posted on:2018-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q YuanFull Text:PDF
GTID:2348330533969225Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the coming of the big data era and the cloud computing technology mature,big data analysis and processing work tends to the cloud computing platform.To use the cloud computing platform to analyze and process distributed big data,the first issue to consider is how to migrate distributed big data to the appropriate data centers of could platform.Distributed big data scheduling problem provides for CSPs a reasonable scheduling strategy.It is very important for reducing the cost of CSPs and improving their quality of service.How to reduce total cost for CSPs by leveraging data centers cost heterogeneity for different user tasks model is a problem to be solved.In this paper,we study two aspects of big data scheduling problem respectively for BoTs tasks model and DAG tasks model.For the problem of distributed big data scheduling for BoTs model,we build a multi-objective programming model,and give the MMCG algorithm for the problem.We consider optimization objectives of the cost and delay and the constraint conditions of capacity and load.In MMCG algorithm,we propose a calculation method of user data association degree for BoTs tasks model,and calculate the correlation degree of all the data,which provides a powerful basis for the subsequent segmentation step.In addition,a greedy algorithm based on maximum and minimum cut is designed,and the correlation matrix is partitioned until the capacity and load constraints are satisfied.Then,a feasible solution of the problem is obtained.The optimization goal of this problem is that under the constraints of load and capacity,the cost and the user delay are as small as possible.According to our experiments,MMCG algorithm has good results in terms of cost and delay.For the problem of distributed big data scheduling for DAG model,mathematical programming model of the previous problem is used continuously.We also give the TGCG algorithm for this problem according to the characteristic of DAG tasks model.We analyze the differences between DAG tasks model and BoTs tasks model,and summarize the characteristics of DAG tasks model.Following the system model,the platform model and the mathematical model of previous problem,we also get a multi-objective programming problem.In TGCG algorithm,we propose a calculation method of user data association degree for DAG tasks model,and calculate the correlation degree of all the data,which is used as the basis for the subsequent data cluster deduplication processing.In addition,a greedy algorithm based on job stream segmentation and raw data deduplication is designed.Under the constraints of capacity and load,the partition step doesn't stop until the partitioned data cluster satisfies the condition.Then,a feasible solution of the problem is obtained.The optimization goal of this problem is also that under the constraints of load and capacity,the cost and the user delay are as small as possible.According to our experiments,TGCG algorithm has good results in terms of cost and delay.
Keywords/Search Tags:cloud computing, big data, data migration, cost optimization
PDF Full Text Request
Related items