Research On Scheduling Algorithm Based On Hadoop

Posted on:2013-12-19

Degree:Master

Type:Thesis

Country:China

Candidate:M Jiang

Full Text:PDF

GTID:2248330371984016

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Cloud computing is a new kind of distributed computing model. It distributes the tasks to alarge cluster which composes of lots of computers, and enables users to obtain computing power,storage space and information services according to their demand. It provides reliable, safe datacenter for storage. Users no longer worry about data loss, viruses and other problems. Cloudcomputing solves the problems of large scale parallel computing, data distributed storage,real-time data backup, highly integrated applications, safety, reliability and personalizedapplications, in technology. It is popular with enterprises and individual customers. Theappearance of Cloud computing has far-reaching significance for IT evolution, and promotes theprogress of enterprises and the society. It also brought new opportunities, and started a moreefficient, flexible, collaborative computing model.Hadoop is an open source platform of cloud computing which is used for analysis andprocess of distributed dense data based on Java. It has become the driving force behind thedevelopment of the industry, relying on the advantages of high capacity and low cost. Large datarevolution is going on with the center of Apache Hadoop. Hadoop is a parallel systemprocessing mass data. It runs on large clusters and schedule thousands of tasks. So choosingappropriate scheduling program for Hadoop has great influence on the ability to executive andinteract. The research on the scheduling algorithm on Hadoop has vital significance.This paper introduces the cloud computing briefly at first. The key point of this paper is theresearch on Hadoop scheduling algorithm. It proposed a load balance scheduling algorithmaiming at the shortcomings of the existing algorithm The original algorithm of that Hadoopexecution mechanism is improved. The computing of the time to end is more accurate. Thealgorithm can find the true straggles, and reassign them to normal nodes. The upper limit valueof backup task numbers constantly changes according to the network load conditions toguarantee the network load balance. This can also avoid the congestion which is caused by the excessive execution of backup tasks and improve the overall performance and the utilizationsystem resources on Hadoop.In addition, we build a Hadoop cluster, and implement the proposed load balance algorithmon it. We tested our algorithm repeatedly and record the system performance. The results arecompared with the existing scheduling algorithm. According to the experiment results, we foundthat this algorithm applies only to heterogeneous environment. In the heterogeneousenvironment, this scheduling algorithm can make the response time of system10%shorter, andimprove the processing efficiency of system. The waste of system resources can be avoided bydynamically adjusting the upper limit value of backup task numbers according to the networkload conditions.

Keywords/Search Tags:

Hadoop, MapReduce, scheduling algorithm, load balancing

PDF Full Text Request

Related items

1	Research On Scheduling Algorithm Based On Hadoop
2	Research On MapReduce Performance Optimization Based On Hadoop
3	Research Of Job Scheduling On Hadoop Platform Based On Load Balancing
4	Research On Hadoop Distributed System Of Scheduling Alogrithm
5	Research And Improvement Of Job Scheduling Algorithms On Hadoop Platform
6	Based On Feedback Scheduling Algorithms For Dynamic Load Balancing In The Heterogeneous Environment Of Hadoop Design And Implementation
7	Research On Optimization Of Data Load Balancing In Hadoop Clusters And Application Of Haddoop Platform
8	Research On Scheduling Algroithm In Hadoop Mapreduce
9	Research And Implementation Of Local Priority Scheduling Algorithm Based On Mapreduce For Massive Data
10	Research On Key Issues Of Task And Job Scheduling For MapReduce Clusters