Font Size: a A A

Optimization And Implementation Of Distributed Parallel Processing For Iterative Jobs

Posted on:2015-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:B B XuFull Text:PDF
GTID:2308330452957213Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Google MapReduce is a parallel programming model for large-scale data sets, thecurrent application has been very extensive. In data mining, machine learning, and otherareas of scientific computing, iterative calculation has a very wide range of applications.However, the existing MapReduce implementation such as Hadoop, can not effectivelysupport the iterative data processing, thus improving the performance of traditionalMapReduce processing iterative problem is very important.This paper analyzed parallel programming on MapReduce, iterative jobs and iterativeimprovement of existing MapReduce model deeply, summed up the problems of theexisting methods. The traditional MapReduce when dealing with iterative jobs withoutdistinction between static and dynamic data, and intermediate results of the iterativeprocess write into distributed file system. Iteration termination condition judge also needto consume a MapReduce job. MapReduce haven’t improved the iterative process in Interms of dependency, iterative process strictly in accordance with the general operationexecuted in serial order of execution.An iterative parallel processing model based on temporary jobs is proposed. For theissues of the different treatment for dynamic data and static data and the Iterativeintermediate results written into distributed file system, a local cache structures isdesigned to store static data and the intermediate results of iteration. For the originalMapReduce processing iterative process in strictly dependence, a new iterative controlloop structure is designed, using temporary jobs to join two iterations to achieve parallelexecution of successive iterations. For the issues of the additional overhead in terminationcondition judgement, a new iteration termination condition detector is designed, thejudgment automatically determined by the internal iterative process, and simultaneouslywith the next iteration.The iterative process model was described in detail, include iterative data flowparallel design, iterative cache control, iterative loop control, the design of the iterationtermination condition and the design of temporary jobs. And select the two common iterative calculation, Comparative experiments was carried out respectively in the Hadoopplatform and improved prototype system. The experimental results show that the improvedmodel of iterative processing performance of the job better than Hadoop.
Keywords/Search Tags:Distribution, Parallel, Iterative Calculation, Temporary Job
PDF Full Text Request
Related items