Research And Implementation Of Local Priority Scheduling Algorithm Based On Mapreduce For Massive Data

Posted on:2013-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Hua

Full Text:PDF

GTID:2298330422474310

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the continuous development of network informationtechnology and rapid expansion of massive data’s application, the computing clusterbuilt by a single enterprise can no longer overcome the challenges on efficientmanagement and optimal computing brought by gradual growth of massive data. Thus,the idea that pushing the computing operation to cloud, which is also called cloudcomputing, is proposed by enterprises and research institutions. Nowadays, the conceptof cloud computing is already widely accepted by enterprises and research institutionand gains lots of achievement on stability and practicality.Among those achievements, MapReduce is a typical solution which has asignificant role in massive data distributed computing. Hadoop, where the corefunctionality of MapReduce is realized, has been a base important platform to study theMapReduce distributed computing due to its own open-source property. The work ofthis paper is based on Hadoop.The work scheduling problem in the MapReduce distributed computing model hasa impressive effects on the performance and stability of the whole system. Aiming tothe poor data locality of the existing work scheduling algorithm, this paper proposed alocal-priority working scheduling algorithm. The proposed algorithm utilizes a new wayto solute the conflict between the data locality and the system workload balance, whichoptimizes the work balance property by scheduling the work based on the priority level.As a consequence, the the IO overhead in the computing process is reduced while thedata locality is guaranteed, which results in the improvement in the system throughputand the reduction of a single work execution time.In this paper, the local-priority working scheduling algorithm is designed, realizedand verified in the MapReduce programming model based on the HDFS distributedstorage system. As experiments result shows, the system IO overhead is improvedefficiently and the single work execution time is rapid reduced, while the data locality iscompletely guaranteed.

Keywords/Search Tags:

Load balancing, data locality, MapReduce, cloud computing

PDF Full Text Request

Related items

1	Research And Implementation Of Local Priority Scheduling Algorithm Based On Mapreduce For Massive Data
2	The Research Of Load Balancing In Mapreduce Based On Sampling Estimation
3	Research On Load Balancing In The Construction Of Cloud Computing Data Center
4	Research And Implementation Of A Load Balancing Technology Based On Data Correlation In Cloud Computing
5	Research On Load Balancing Strategy Based On SLA Optimization In Cloud Computing
6	Research On Load Balancing Algorithm For Scheduling Based On Hadoop
7	Research Of Load Balancing Strategy In Cloud Computing
8	Research And Implementation Of Load Balancing Based On Predicting Under Cloud Computing Environment
9	Research On I / O Bottleneck Solution Of Cloud Platform Network Based On Load Balancing
10	An Intermediate Data Placement Algorithm For Load Balancing In Spark Computing Environment