Font Size: a A A

Research And Implementation Of The MapReduce Jobs Composition System

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:C J ZhuFull Text:PDF
GTID:2248330392961054Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
MapReduce is a programming model for massive data processing incloud computing environment. A MapReduce job is a program that cancomplete limited functions. Different MapReduce jobs need to collaboratewith each other in order to complete a complex task. Although there existsMapReduce-oriented open-source workflow engine like Oozie, Oozie doesnot directly support iterative calculation and lacks visual developmentenvironment, so Oozie is difficult to use in actual projects. This paper isbased on the actual project of building a provincial transportation andlogistics cloud computing platform. According to the requirements, theMapReduce jobs and templates repository for massive data processingmust be established, and those who have no experience with MapReduceprogramming can compose jobs from the repository to develop massivedata processing application.To address problems mentioned above, based on in-depth analysis ofthe actual requirements and related technologies, this paper researchs thekey technologies, including composition mode of MapReduce jobs, jobscomposition description language JCDL, the converting methods thatconvert JCDL into Oozie hPDL and MapReduce code. Besides, this paperdesigns and implements a visual MapReduce composition prototypesystem that supports jobs and templates repository. Experimentsdemonstrate that this system is feasible and effective.This paper has the following content:(1) For the problem of describing the relationship betweenMapReduce jobs, this paper proposes a variety of MapReduce jobscomposition mode, covering the dependencies between MapReduce jobslike sequence, fork, link and loop. (2) For the problem of difining the MapReduce jobs composition,this paper proposes JCDL (Jobs Composition Description Language) thatis based on XML, and proposes the conversion algorithm between JCDLand graphical representation.(3) For the problem of executing the composition jobs, this paperproposes two methods. In the first method, JCDL is converted into hPDL,and then the application running in the Oozie environment is automaticallycreated, employed and executed. The advantage of this method is that thehigh fault tolerance feature of Oozie can be fully utilized. In the secondmethod, MapReduce program is generated according to the JCDL, andthen the MapReduce program is automatically submitted and executed onHadoop. The advantage of this method is that the application runs moreefficiently.(4) Based on the above work, this paper designs and implements avisual MapReduce Jobs Composition System (MRJCS) that supports jobsand templates repository. This prototype system is composed of visual jobscomposition tool, JCDL validation subsystem, MapReduce jobs andtemplates repository management subsystem, JCDL executing enginebased on Oozie and JCDL executing engine based on MapReduce.(5) This paper uses the BP neural network classification applicationto verify the feasibility and effectiveness of the MRJCS. Firstly, theimproved implementation of BP neural network algorithms is introduced.Then, a classification application is created by using the MRJCS. Theexperiment on KDD CUP1999data set shows that the composition jobcan complete the classification process with the classification accuracy of91.6%, and the efficiency of executing method based on MapReduce is11.7%higher than the efficiency of executing method based on Oozie.
Keywords/Search Tags:MapReduce, Jobs composition, Hadoop, Oozie, Codegenerating
PDF Full Text Request
Related items