Font Size: a A A

A Study Of Key Technologies About Task Scheduling On Distributed Stream Computing Platform

Posted on:2015-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:S P ZhangFull Text:PDF
GTID:2348330542452507Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,along with the rapid development of data processing technology,the applications which is based on data analysis come out in a large numbers.Secondly,along with the constantly updated information collection technology in various fields,there has been occurred a lot of unstructured data which is in real-time,the real-time data generated like water and inflowed into the data processing system.How to grab the valuable information in real-time data streams and make a classification in a real-time and accurate calculation,making this information processed in the fastest time and drawn the corresponding conclusions before disappearing is critical.Traditional distributed processing model can not achieve the above requirements,so a new model of distributed processing stream computing emerged,for its good scalability,flexibility,ease of use,this treatment model is welcomed by the industry.This paper presents a complete set of stream computing processing platform.On this platform,users can completely avoid the cluster structures,the tedious work of platform operation and maintenance,the achievement of communications,greatly shorten the development cycle.In a distributed stream computing systems(such as cloud computing,etc.),the multi-tasking needs to run complex calculations,we usually assign these tasks to a lot of processors to process,this process is called task scheduling.For the same input data stream,using different scheduling algorithms,the difference of performance is very big.The traditional processing mode,the input data are mostly static,so the execution time of the task is predictable;Under the conditions of a limited number of processors,given a set of DAG,the HEFT algorithm can get a group of efficient scheduling program,and is very fast;But the indeed stream computing platform is a steady stream of input data streams,and the data volume size is uncertain,eventually leading to the execution time of the task is uncertain.In practical application environment,it will directly led to the experimental results of HEFT algorithm has a big gap between with the expected results.When run a task scheduling of workflow in the distributed stream computing system,how to solve the uncertain of a single task execution time is essential.Because of this uncertainty,the static task scheduling approaches of workflow may suffer.Therefore,in this paper,a novel task scheduling approach is proposed which is based on a Monte Carlo method.The approach is built on the top of a classic static task scheduling heuristic(HEFT),using a random number generation algorithm,under certain constraints to generate a large number of task execution time,combined with normal,uniform model to simulat the task execution time,thus ensuring the availability of HEFT algorithm;Using HEFT algorithm,combined with random task execution time,you can generate a lot of scheduling scheme,and then select the optimal scheduling scheme from these scheduling scheme,and as the final output program.By researching the key technologies through the entire process,such as random number generation mechanisms,determine the completion time threshold,limited number of repetitions of each stage,performance evaluation criteria etc.,to ensure the efficiency of this algorithm.By applicating this new set of task scheduling method to "water line cloud" platform,the experimental results show that:The proposed method not only shorten the time of task scheduling and significantly improved the platform performance,but also has a very strong versatility.
Keywords/Search Tags:Heterogeneous computing systems, Task scheduling, Static scheduling heuristic, Monte Carlo methods
PDF Full Text Request
Related items