Optimization And Implementation Of Distributed Parallel Processing For Iterative Jobs

Posted on:2015-10-03

Degree:Master

Type:Thesis

Country:China

Candidate:B B Xu

Full Text:PDF

GTID:2308330452957213

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Google MapReduce is a parallel programming model for large-scale data sets, thecurrent application has been very extensive. In data mining, machine learning, and otherareas of scientific computing, iterative calculation has a very wide range of applications.However, the existing MapReduce implementation such as Hadoop, can not effectivelysupport the iterative data processing, thus improving the performance of traditionalMapReduce processing iterative problem is very important.This paper analyzed parallel programming on MapReduce, iterative jobs and iterativeimprovement of existing MapReduce model deeply, summed up the problems of theexisting methods. The traditional MapReduce when dealing with iterative jobs withoutdistinction between static and dynamic data, and intermediate results of the iterativeprocess write into distributed file system. Iteration termination condition judge also needto consume a MapReduce job. MapReduce haven’t improved the iterative process in Interms of dependency, iterative process strictly in accordance with the general operationexecuted in serial order of execution.An iterative parallel processing model based on temporary jobs is proposed. For theissues of the different treatment for dynamic data and static data and the Iterativeintermediate results written into distributed file system, a local cache structures isdesigned to store static data and the intermediate results of iteration. For the originalMapReduce processing iterative process in strictly dependence, a new iterative controlloop structure is designed, using temporary jobs to join two iterations to achieve parallelexecution of successive iterations. For the issues of the additional overhead in terminationcondition judgement, a new iteration termination condition detector is designed, thejudgment automatically determined by the internal iterative process, and simultaneouslywith the next iteration.The iterative process model was described in detail, include iterative data flowparallel design, iterative cache control, iterative loop control, the design of the iterationtermination condition and the design of temporary jobs. And select the two common iterative calculation, Comparative experiments was carried out respectively in the Hadoopplatform and improved prototype system. The experimental results show that the improvedmodel of iterative processing performance of the job better than Hadoop.

Keywords/Search Tags:

Distribution, Parallel, Iterative Calculation, Temporary Job

PDF Full Text Request

Related items

1	Parallel Calculation Of Object’s RCS Based On Facets Grouping Algorithm
2	The Temporary Exhibition In The Museums
3	Research And Application On Parallel And Iterative Decoding Methods Of Specific LDPC Codes
4	Study On Parallel Alogrithm Of Large-scale Numerical Calculation In Cloud Computing Environment
5	Research On Calculation Method Of Distribution Network Power Loss Based On Fusion Marketing Data
6	Research And Development Of Visible Power Flow Calculation Software Of Distribution Networks
7	Iterative processing: From applications to parallel implementations
8	Research On User Clustering Oriented To Temporary Hotspot Resource Allocation
9	Research On Polar Code For High Throughput Application
10	Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments