Font Size: a A A

Research On Scheduling Algorithm Based On Hadoop

Posted on:2013-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:M JiangFull Text:PDF
GTID:2248330371984016Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Cloud computing is a new kind of distributed computing model. It distributes the tasks to alarge cluster which composes of lots of computers, and enables users to obtain computing power,storage space and information services according to their demand. It provides reliable, safe datacenter for storage. Users no longer worry about data loss, viruses and other problems. Cloudcomputing solves the problems of large scale parallel computing, data distributed storage,real-time data backup, highly integrated applications, safety, reliability and personalizedapplications, in technology. It is popular with enterprises and individual customers. Theappearance of Cloud computing has far-reaching significance for IT evolution, and promotes theprogress of enterprises and the society. It also brought new opportunities, and started a moreefficient, flexible, collaborative computing model.Hadoop is an open source platform of cloud computing which is used for analysis andprocess of distributed dense data based on Java. It has become the driving force behind thedevelopment of the industry, relying on the advantages of high capacity and low cost. Large datarevolution is going on with the center of Apache Hadoop. Hadoop is a parallel systemprocessing mass data. It runs on large clusters and schedule thousands of tasks. So choosingappropriate scheduling program for Hadoop has great influence on the ability to executive andinteract. The research on the scheduling algorithm on Hadoop has vital significance.This paper introduces the cloud computing briefly at first. The key point of this paper is theresearch on Hadoop scheduling algorithm. It proposed a load balance scheduling algorithmaiming at the shortcomings of the existing algorithm The original algorithm of that Hadoopexecution mechanism is improved. The computing of the time to end is more accurate. Thealgorithm can find the true straggles, and reassign them to normal nodes. The upper limit valueof backup task numbers constantly changes according to the network load conditions toguarantee the network load balance. This can also avoid the congestion which is caused by the excessive execution of backup tasks and improve the overall performance and the utilizationsystem resources on Hadoop.In addition, we build a Hadoop cluster, and implement the proposed load balance algorithmon it. We tested our algorithm repeatedly and record the system performance. The results arecompared with the existing scheduling algorithm. According to the experiment results, we foundthat this algorithm applies only to heterogeneous environment. In the heterogeneousenvironment, this scheduling algorithm can make the response time of system10%shorter, andimprove the processing efficiency of system. The waste of system resources can be avoided bydynamically adjusting the upper limit value of backup task numbers according to the networkload conditions.
Keywords/Search Tags:Hadoop, MapReduce, scheduling algorithm, load balancing
PDF Full Text Request
Related items