Font Size: a A A

Design And Implementation Of Traffic And Logistics Big Data Process Sytstem Based On Hadoop

Posted on:2015-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y T WangFull Text:PDF
GTID:2298330452464183Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, big data processing technologies play an increasinglysignificant role for the decision analysis and operation in various industriesand government departments. This article is on the background of aprovincial transport and logistics platform project, which requires build abig data process service for SaaS applications of thousands of transportcorps and government departments based on MapReduce and Hadoop. Dueto the development of transportation and logistics-oriented large dataprocessing application related to transportation and logistics operationsspecialist, data analysts and application developers and other roles, so howto accomplish the collaborative and agile development is a critical issue.Moreover, while Hadoop Oozie supports the assembly of big dataprocessing application, but, how to resolve the issue of low effiency ofOozie at execution time due to data dependencies between nodes is also apractical problem needs to be solved.In response to these problems, based on the analysis of MapReduce,Hadoop, Oozie and related technologies, this paper addresses apartial-parallel execution optimization plan for the workflow engine Oozieand designs and implements a transport and logistics data processingsystem which supports the collaborative development and workflowassembly. Tests and usages show that the system is both feasible andeffective.Compared with similar systems, this article has the followingfeatures:1) For the issue of low efficiency of Oozie, this paper addresses apartial-parallel execution optimization plan for MapReduce workflow.Because the completed time of all of reduce tasks in MapReduce job is different, so without waiting for the upstream nodes completely over, thedownstream node can be started after some reduce tasks completed. In thisway, the upstream node and downstream node can run partial parallelly.2) Based on the Hadoop, this paper addresses and implements a newMapReduce framework, which supports appending input to a runningMapReduce job. This new framework provides the underlying basis for thepartial parallel MapReduce workflow engine3) Based on the Oozie, this papers implements a workflow enginesupported the partial-parallel execution algorithm. This new workflowengine has two workflow executors and can choose an alternate executoraccording to the workflow instance. The evaluation shows that when thenumber of upstream node is larger than the number of reduce slots in thecluster, then the effiency can be improved by19%.4) For the collaborative development issue, this paper provides dataprocessing component development environments for developers. Thisdevelopment environment is based on Hadoop Eclipse plugin and add thefunction of sandbox, and developers can develop, test, and deploy with it.
Keywords/Search Tags:Big Data Processing, MapReduce, Hadoop, Oozie
PDF Full Text Request
Related items