Font Size: a A A

Task Scheduling And Virtual Machine Integration Of Data Intensive Batch Processing Workflow

Posted on:2020-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:K FengFull Text:PDF
GTID:2428330602966002Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud computing technology provides new technical support and development opportunities for large data workflow processing.Data-intensive applications are one of the most common applications in the context of large data.Due to the limitations of large amount of data and bandwidth bottlenecks,many instances of the same type of workflow are usually merged to form batch workflow for workflow scheduling to improve execution efficiency.However,data communication is often the performance bottleneck of data-intensive applications,and many workflows should be added.Due to the existence of time constraints,a reasonable scheduling strategy is particularly important for time-constrained data-intensive batch workflow.Reasonable scheduling strategy can not only effectively reduce data transmission across nodes,but also optimize the overall efficiency of workflow execution and resource leasing costs.Based on the previous work,this paper discusses and studies the overall execution efficiency,resource leasing cost and related technologies of task scheduling in data-intensive batch processing workflow.Combining with the characteristics of data-intensive batch processing workflow,this paper deeply studies the task scheduling efficiency and cost calculation optimization mechanism of data-intensive batch processing workflow.Consider the advantages of batch workflow scheduling.In order to solve the problems of cross-node data transmission,overall execution efficiency and resource leasing cost optimization,this paper studies the following three aspects:Firstly,the virtual machine type initialization allocation.In this paper,an integer programming algorithm is proposed,which is based on the integer programming model of the virtual machine type problem,and the CPLEX method is used to solve the virtual machine type ultimately.Then,the task initialization virtual machine type,the appropriate task execution time and the better performance-price ratio are determined.Then,an improved local task association degree clustering algorithm is proposed to make the task with frequent data transmission act as a task.Follow-up scheduling is carried out to optimize global data communication.Secondly,deadline division of workflow.An iterative task deadline generation method considering partial order relationship and time gaps is proposed.By allocating time gaps reasonably,the deadlines of sub-workflow are divided,which provides more scheduling space for subsequent tasks,and then solves the data transmission bottleneck and deadline optimization problems of data-intensive applications.Thirdly,the optimization of virtual machine rental cost.On the basis of the above two aspects,a weighted fusion MFA task scheduling algorithm is proposed to optimize the cost considering the allocation of the remaining time gap of the leased time slice,the cost of virtual machine and the efficiency of task execution.On this basis,a RMFA algorithm considering the right-shift strategy is proposed,which can improve the execution efficiency and reduce the rental cost when dealing with data-intensive batch workflow.The feasibility of the method is verified by experiments in the above three aspects,and the data-intensive batch workflow scheduling and calculation can be completed in a short time.The research results of this paper can provide a comprehensive technical reference for the research of data-intensive batch processing workflow in large data environment.
Keywords/Search Tags:Cloud computing, Data-intensive applications, Batch workflow, Task clustering, Workflow deadline partition, Virtual machine rental cost optimization
PDF Full Text Request
Related items