Font Size: a A A

Performance Analysis And Optimization Of Mapreduce

Posted on:2013-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z HeFull Text:PDF
GTID:2248330392957662Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the increasing popularity of the Internet and proliferation of data processingrequirements, cloud computing, with its cost effectiveness, powerful computing andstorage capacity, good security, and many other fine features arises increasing interest ofmajor IT companies. MapReduce is a programming model of cloud infrastructure fordistributed execution of the jobs submitted by users on the cluster. Its performance on jobscheduling and execution attracts the attention of users.In this dissertation, some optimizing directions to improve the performance ofMapReduce are meticulously discussed. Having examined some domestic and overseasoptimization techniques, a multi-queue scheduling strategy which is used in multiple usergroup environment and supports dividing of job-type is given. The scheduler achieves atwo-step scheduling by dividing the job queues into waiting-queues and running-queues. Itsupports multiple user groups by multi-queues. Computing resources of the idle queuescan be occupied by others to avoid resource waste. Occupied resources can be reclaimedwhen needed, and preemption is supported. The ping-pong effect is prevented by thelogical division of the "shared queue list" and the "non-shared queue list". The schedulersupports job-type dividing, which gives flexibility to assign different types of job toimprove utilization of node’s hardware and accelerate the response speed of jobs.Furthermore, this dissertation aims to study the methods for enhancing the networkperformance in terms of task reduction. By merging outputs together, the Map outputbecomes more compact, the number of local files is reduced, and the size of a singleoutput file is enlarged. In the shuffle phase of Reduce, due to the long-polling benefit, thenumber of network connections is cut down, and network I/O performance is improved.A way is proposed to analyze the performance of the programming model throughthe life cycle of jobs, task throughput, and key functions in the job execution. Finally, theperformance of the implementation of our optimization is tested and analyzed.
Keywords/Search Tags:Cloud computing, MapReduce, Scheduling strategy, Performance analysis
PDF Full Text Request
Related items