Research Of Job Scheduling On Cloud Computing

Posted on:2013-10-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H L Shi

Full Text:PDF

GTID:1228330395983687

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Opening up a new era of Internet, Cloud computing will drive software, hardware, communication and networking industry prosperity, which is a milepost that the human society realizes the "Intelligent Earth". Cloud computing, integration of parallel computing, distributed computing, grid computing, is a revolution of software, hardware technology, virtual technology, network technology.The goal that Cloud computing achieves is to make the Internet resource (computing resource, storage resource, network resource) on the network allocating on-demand such as water and electricity, and according to the request of task complexity and dataset size and rational allocation of resources. At present, job scheduling research on cloud computing is mainly focused on Hadoop model and MapReduce model. Hadoop which was put forward by Apache Software Foundation has a strong advantage for solving simple query on intensive data, its essence is the same operation on different data sets, a greater preference for the distribution of data (parallelism), but the parallelism is limited on a great extent for a complex task request which consisits of many subtasks. At the same time its built-in FIFO, FS, CS algorithm exist many drawbacks such as bad QoS, frequent scheduling, resource fragmentation, stiff allocation, so this article studies job scheduling from two aspects such as the scheduling model and scheduling algorithm. The main innovation points are as follows:(1) In view of the complex task, author puts forward an improved MapReduce model which can convert DAG (Directed Acyclic Graph) of complex task into MCST (Minimum Cost Spanning Tree) before Map process begins, providing a reliable basis for the minimum execution time. This model extends Hadoop application domain greatly from commercial and interactive domain to the field of scientific computing. Especially for those only familiar with MPI (Message Passing Interface) researchers, this model provides a new method which can use cheap commercial PC engaged in scientific research and system architecture.(2) In response to user’s multiple types, multiple granularity task request, author proposes ACO task scheduling algorithm based on cloud computing, which fuses ACO’s dynamic parallelism into Hadoop architecture, making parallelism and distribution extending greatly, and avoiding waiting more time problems in other algorithms on the scene of coarse-grainular tasks embed in a large number of small size task, enhancing the interactivity. Through the intelligent global pheromone updating and local pheromone updating method, and setting the maximum and the minimum, this improved ACO algorithm reduces the possibility of being trapped in local optimum. ACO heuristic scheduling algorithm is very suitable for the environment which provision of resources and task requests are dynamically changed, using pheromone dynamically change to find global optimal solution.(3) At the scene of cloud computing with a low virtualization degree, the ordinary commercial PC is an independent individual resource with multiple attributes. Author proposes a multiple target query processing method based on the probability model Improved Approximate Skyline, which compares probability threshold that Master transmits Slave node, aiming to filter out unsuitable resoures node for multi-target assignment. Performance parameter of high probability node will be uploaded to Master node, which greatly reduces the amount of data transmission; at the same time, the activity characteristics of this algorithm avoids the negative effects of task execution of Hadoop model’s Heartbeat size. The method combines the Skyline query and the advamtage of FIFO algorithm built-in Hadoop model, which makes the query efficiency and allocation efficiency into a full play.(4) For task granular difference is too big, the fixed resource allocation is easy to cause the waste of resources and the problem of overload. Author puts forward a kind of elastic task scheduling model according to task granular and resource granular. In this model, free nodes can join the heavy load node cluster, or withdrawal from light load node cluster; node cluster are similar to a middle calculation unit node layer between Master and Slave. The member node in this cluster is not fixed, intangible, shaping with task’s creating, disappearing with task’s completion. When the load rate of a cluster is too high, free node will be added into; when the load rate of a cluster is too low, free node will be left from it. This model breaks "time for quality" manacle through the elastic calculation, creating a "space for quality" elastic calculation model, greatly enhances the flexibility and expansibility of systems.Finally, the scheduling model and scheduling algorithm which are put forward in this thesis are summarized from the innovative aspects, and then the future of the development of cloud computing and task scheduling research direction is prospected, and then this conclusion and development points out that the reconfigurable, scalability, availability, data storage layout optimization strategy, task duplication and task cluster will be the next research emphasis and difficulty.

Keywords/Search Tags:

Cloud computing, Job scheduling, Hadoop, MapReduce, ACO(Ant ColonyOptimaztion), DAG(Directed Acyclic Graph), Skyline

PDF Full Text Request

Related items

1	A Multi DAG Scheduling Strategy Based On Backfill In Cloud Computing
2	Research On Scheduling Algorithm For Parallel Job Modeled By Directed Acyclic Graph
3	Research Of Cloud Task Scheduling Based On Antcolony Algorithm And DAG Workflow
4	Research On Energy-efficient Scheduling Algorithms For Heterogeneous Computing Systems
5	Research On Scheduling Algroithm In Hadoop Mapreduce
6	Research And Improvement Of MapReduce Scheduling Mechanism On Cloud Computing
7	The Mapreduce Model In The Hadoop Implementation Of Performance Analysis And Optimization Improvements
8	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
9	Research On Parallel Skyline Algorithms And Their Applications In Cloud Computing Environment
10	Research On Optimization And Improvement Of MapReduce Job Scheduling Algorithm