Design And Implementation Of Traffic And Logistics Big Data Process Sytstem Based On Hadoop

Posted on:2015-07-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Wang

Full Text:PDF

GTID:2298330452464183

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years, big data processing technologies play an increasinglysignificant role for the decision analysis and operation in various industriesand government departments. This article is on the background of aprovincial transport and logistics platform project, which requires build abig data process service for SaaS applications of thousands of transportcorps and government departments based on MapReduce and Hadoop. Dueto the development of transportation and logistics-oriented large dataprocessing application related to transportation and logistics operationsspecialist, data analysts and application developers and other roles, so howto accomplish the collaborative and agile development is a critical issue.Moreover, while Hadoop Oozie supports the assembly of big dataprocessing application, but, how to resolve the issue of low effiency ofOozie at execution time due to data dependencies between nodes is also apractical problem needs to be solved.In response to these problems, based on the analysis of MapReduce,Hadoop, Oozie and related technologies, this paper addresses apartial-parallel execution optimization plan for the workflow engine Oozieand designs and implements a transport and logistics data processingsystem which supports the collaborative development and workflowassembly. Tests and usages show that the system is both feasible andeffective.Compared with similar systems, this article has the followingfeatures:1) For the issue of low efficiency of Oozie, this paper addresses apartial-parallel execution optimization plan for MapReduce workflow.Because the completed time of all of reduce tasks in MapReduce job is different, so without waiting for the upstream nodes completely over, thedownstream node can be started after some reduce tasks completed. In thisway, the upstream node and downstream node can run partial parallelly.2) Based on the Hadoop, this paper addresses and implements a newMapReduce framework, which supports appending input to a runningMapReduce job. This new framework provides the underlying basis for thepartial parallel MapReduce workflow engine3) Based on the Oozie, this papers implements a workflow enginesupported the partial-parallel execution algorithm. This new workflowengine has two workflow executors and can choose an alternate executoraccording to the workflow instance. The evaluation shows that when thenumber of upstream node is larger than the number of reduce slots in thecluster, then the effiency can be improved by19%.4) For the collaborative development issue, this paper provides dataprocessing component development environments for developers. Thisdevelopment environment is based on Hadoop Eclipse plugin and add thefunction of sandbox, and developers can develop, test, and deploy with it.

Keywords/Search Tags:

Big Data Processing, MapReduce, Hadoop, Oozie

PDF Full Text Request

Related items

1	Research On Big Data Processing System Based On MapReduce Parallel Processing Framework
2	Design And Implementation Of Metadata Management System For Big Data Processing Of Transportation Logistics
3	Research And Implementation Of The MapReduce Jobs Composition System
4	Massive Data Processing Based On Hadoop2.0
5	Research And Application On Big Data Processing Based On Hadoop Platform
6	Researcn And Application Of Data Processing Based On Hadoop
7	The Research And Analysis Of Hadoop Small File Processing Method
8	Design And Implementation Of The Data Analysis System Besed On Hadoop
9	Research And Implementation On Incremental Data Processing Algorithm Based On Hadoop
10	Vehicle Routing Data Processing System Based On Hadoop And C4.5 Algorithm