Font Size: a A A

Scheduling Algorithm And Improvement Strategy In Hadoop

Posted on:2017-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:J Q WangFull Text:PDF
GTID:2348330518496395Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology and popularity of the Internet in today's society,there are a lot of data related with people activity producing.At the same time,the problem how to store large amounts of data and analyze data effectively becomes people urgently need to solve.Meanwhile,due to increase the amount of data,such as data mining and web indexing data need to access ever-expanding data sets ranging from a few gigabytes to several terabytes or even petabytes.According to the above problem,Google put forward a programming model called MapReduce.The main idea of MapReduce is that user only need to express what calculation wants rather than having to care for the details of parallel computing,fault tolerance,data distribution and load balance.Hadoop is a popular open-source implementation of the Google's MapReduce.Hadoop framework consists of two main components:HDFS(Hadoop Distributed File System)and MapReduce.HDFS use to store the huge amounts of data and MapReduce use to analyze the data.Hadoop has quickly become one of the most popular processing platform because of its high reliability,high scalability,high fault tolerance and low cost.In this paper,through literature research,in view of the Hadoop distributed computing platform is studied.First of all,we discuss the background and significance of the Hadoop platform starting with the producing of Hadoop platform.We also study the architecture and key technologies of the Hadoop platform.Secondly,we study the existing three scheduling algorithm of Hadoop platform,which called FIFO Scheduler,Capacity Scheduler and Fair Scheduler.And we mainly analyze the significance of producing,working principle and shortage of these three algorithms.Then,we propose a new algorithms called Dynamic Matching Based on Memory Scheduler(DMBMScheduler),based on the shortcomings of existing three scheduling algorithm.The new algorithm considers the memory data locality when schedule jobs and schedules the job based on the principle of real-time matching.Finally,we implement and experiment the algorithm.The experiments results show that our algorithm has achieved the expected goals successfully.The algorithm shortens the jobs' execution time and response time,solves the deficiency of the existing algorithm and improves the overall performance of the Hadoop platform.
Keywords/Search Tags:hadoop, mapreduce, job scheduling algorithm
PDF Full Text Request
Related items