Font Size: a A A

Research And Implementation Of Local Priority Scheduling Algorithm Based On Mapreduce For Massive Data

Posted on:2013-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z J HuaFull Text:PDF
GTID:2298330422474310Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the continuous development of network informationtechnology and rapid expansion of massive data’s application, the computing clusterbuilt by a single enterprise can no longer overcome the challenges on efficientmanagement and optimal computing brought by gradual growth of massive data. Thus,the idea that pushing the computing operation to cloud, which is also called cloudcomputing, is proposed by enterprises and research institutions. Nowadays, the conceptof cloud computing is already widely accepted by enterprises and research institutionand gains lots of achievement on stability and practicality.Among those achievements, MapReduce is a typical solution which has asignificant role in massive data distributed computing. Hadoop, where the corefunctionality of MapReduce is realized, has been a base important platform to study theMapReduce distributed computing due to its own open-source property. The work ofthis paper is based on Hadoop.The work scheduling problem in the MapReduce distributed computing model hasa impressive effects on the performance and stability of the whole system. Aiming tothe poor data locality of the existing work scheduling algorithm, this paper proposed alocal-priority working scheduling algorithm. The proposed algorithm utilizes a new wayto solute the conflict between the data locality and the system workload balance, whichoptimizes the work balance property by scheduling the work based on the priority level.As a consequence, the the IO overhead in the computing process is reduced while thedata locality is guaranteed, which results in the improvement in the system throughputand the reduction of a single work execution time.In this paper, the local-priority working scheduling algorithm is designed, realizedand verified in the MapReduce programming model based on the HDFS distributedstorage system. As experiments result shows, the system IO overhead is improvedefficiently and the single work execution time is rapid reduced, while the data locality iscompletely guaranteed.
Keywords/Search Tags:Load balancing, data locality, MapReduce, cloud computing
PDF Full Text Request
Related items