Scheduling Algorithm And Improvement Strategy In Hadoop

Posted on:2017-11-27

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Wang

Full Text:PDF

GTID:2348330518496395

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of information technology and popularity of the Internet in today's society,there are a lot of data related with people activity producing.At the same time,the problem how to store large amounts of data and analyze data effectively becomes people urgently need to solve.Meanwhile,due to increase the amount of data,such as data mining and web indexing data need to access ever-expanding data sets ranging from a few gigabytes to several terabytes or even petabytes.According to the above problem,Google put forward a programming model called MapReduce.The main idea of MapReduce is that user only need to express what calculation wants rather than having to care for the details of parallel computing,fault tolerance,data distribution and load balance.Hadoop is a popular open-source implementation of the Google's MapReduce.Hadoop framework consists of two main components:HDFS(Hadoop Distributed File System)and MapReduce.HDFS use to store the huge amounts of data and MapReduce use to analyze the data.Hadoop has quickly become one of the most popular processing platform because of its high reliability,high scalability,high fault tolerance and low cost.In this paper,through literature research,in view of the Hadoop distributed computing platform is studied.First of all,we discuss the background and significance of the Hadoop platform starting with the producing of Hadoop platform.We also study the architecture and key technologies of the Hadoop platform.Secondly,we study the existing three scheduling algorithm of Hadoop platform,which called FIFO Scheduler,Capacity Scheduler and Fair Scheduler.And we mainly analyze the significance of producing,working principle and shortage of these three algorithms.Then,we propose a new algorithms called Dynamic Matching Based on Memory Scheduler(DMBMScheduler),based on the shortcomings of existing three scheduling algorithm.The new algorithm considers the memory data locality when schedule jobs and schedules the job based on the principle of real-time matching.Finally,we implement and experiment the algorithm.The experiments results show that our algorithm has achieved the expected goals successfully.The algorithm shortens the jobs' execution time and response time,solves the deficiency of the existing algorithm and improves the overall performance of the Hadoop platform.

Keywords/Search Tags:

hadoop, mapreduce, job scheduling algorithm

PDF Full Text Request

Related items

1	The Mapreduce Model In The Hadoop Implementation Of Performance Analysis And Optimization Improvements
2	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
3	Research And Improvement Of Job Scheduling Algorithms On Hadoop Platform
4	An Optimized MapReduce Workfow Scheduling Algorithm For Heterogeneous Computing
5	Research And Implementation Of Scheduling Algorithm Based On MapReduce Cluster
6	Research On Optimization And Improvement Of MapReduce Job Scheduling Algorithm
7	The Research Of Job Scheduling Algorithm In Mapreduce-styled Massive Data Processing Platform
8	Research And Improvement Of Job Scheduling Algorithm Based On Hadoop
9	Research Of Job Scheduling Technology In Hadoop Platform
10	The Research And Implementation Of Hadoop Scheduling Algorithm