The Research Of Hadoop Scheduling Algorithm And Improvement Strategy

Posted on:2014-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:P Li

Full Text:PDF

GTID:2248330398471571

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

We are now living in a network era, people’s lives have been inextricably linked with the Internet. People make friends, do shopping and search for all information they want to know through the Internet. The boom of online life produced a large amount of online data, for example:Facebook stored billions of pictures of its users’ on its servers; The spiders of Google and Baidu will collect T-level webpages everyday. In the face of this mass of information, traditional technologies can not meet the demand. Under this background, the concept of "cloud computing" rose up.Hadoop is an open-source distributed computing platform which is birthed under the background of "cloud computing" and "big data". It draws lessons from Google’s GFS and MapReduce technology. Developers can easily develop and run applications to handle massive data, without considering the details of distributed computing. Hadoop is open-source, it has a strong ability for distributed computing and can be easily used by developer. Thus in just a few years, it has become the most famous distributed computing platform.In this paper, a thorough study of hadoop, the most famous distributed computing platform is made. First, we get in touch with the concept and technical architecture of "cloud computing" born under the background of "big data". Then, we study the most famous distributed computing platform which is called hadoop, make in-depth analysis of the architecture model, working mechanism and reliability of its key technologies:HDFS and MapReduce. At last, we do research in the job scheduling mechanism of hadoop. After studying the most commonly used three scheduling algorithms:FIFO Scheduler, Capacity Scheduler and Fair Scheduler, we propose a series of improvement ideas, such as Job matching, Job combination and Priority strategy, then, implement an new scheduling algorithm based on those improvements ideas. The new scheduling algorithm is called Dynamic Priority Based Compose Scheduler, or DPBC Scheduler. DPBCScheduler uses the principle of job matching to improve the scheduling performance. During job matching analysis process, it uses dynamic priority strategy to achieve real-time updates of the match extent. At the same time, job combination strategy is added, letting the priority strategy used inside a jobgroup, not the entire job queue, to reduce the burden of scheduling. After the final coding and testing, the improved algorithm successfully achieves the desired goals, bringing a huge improvement to the performance of the system.

Keywords/Search Tags:

Cloud Computing, Distributed, Hadoop, MapReduce, Scheduling algorithm

PDF Full Text Request

Related items

1	Research On Scheduling Algroithm In Hadoop Mapreduce
2	The Mapreduce Model In The Hadoop Implementation Of Performance Analysis And Optimization Improvements
3	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
4	Research And Improvement Of MapReduce Scheduling Mechanism On Cloud Computing
5	Research On Optimization And Improvement Of MapReduce Job Scheduling Algorithm
6	The Research And Implementation Of Hadoop Scheduling Algorithm
7	Research On Hadoop Platform And Its Job Scheduling Algorithm
8	Research On Algorithm Analysis And Modificating Of Job Scheduling For Hadoop
9	Research And Improvement Of Job Scheduling Algorithm Based On Hadoop
10	A Priority-based Scheduling Algorithm For Hadoop