Font Size: a A A

An Optimized MapReduce Workfow Scheduling Algorithm For Heterogeneous Computing

Posted on:2014-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:M LiuFull Text:PDF
GTID:2268330425483706Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Under the background of today’s explosive information growth, the cloudcomputing technology, with its high performance on computing and massive datastorage capacity, has received a great deal of attention and been widespread used.However, with the development of more and more powerful applications/software on the cloud computing environment, their logical relations are alsobecoming increasingly complex. The data processing is thus no longer a simplecomputing problem but a more complicated one affected by time, cost, resources andpriority constraints and other factors. General information systems, however, areusually unable to take a full control of such complex applications. To address theproblem, cloud workflow are proposed as a hot research spot by deploying workflowsin a cloud environment. In this way, cloud computing resources can be fully takenadvantage of and the entire application flow can be flexibly constructed, managed,implemented and monitored using the workflow.In the existing service orchestration, bandwidth bottleneck problems are alwaysinevitable when big data are being processed since all the data have to be transmittedthrough the central engine. This paper has analyzed the the framework of cloudworkflow and gives an optimization of the framework of the service orchestration.The optimized framework introduces a structure of intermediate agent layer andimproves the workflow engine. The intermediate agent is in charge of the nodes sothat there are only controlling information transmitted between agents and theworkflow engines. Point-to-point data transmission can be used between underlyingnodes, so that the occurrence of bottlenecks can be greatly reduced. Moreover, theoptimized workflow engines is able to assign different types of tasks to appropriatecloud underlying platforms (such as Hadoop) or fundamental processing nodes fordata processing. As MapRduce is a typical parallel scalable programming modelfacing big data processing in cloud computing, the investigation began from theworkflow of MapReduce and then proposed a workflow scheduling model forMapReduce. The proposed model can be regarded as a simple microcosm of the cloudworkflow since MapReduce is a typical framework in the cloud environment, and theresearch results will benefit to the follow-up study of cloud workflow.Existing workflows composed by MapReduce tasks usually divides the scheduling of workflow task priority and the scheduling of underlying MapReduceinto two parts, which is ineffective in resource utilization since a large amount of timefragments are introduced during scheduling. This paper proposed a n optimizedscheduling algorithm under a heterogeneous environment MRWS (MapReduce-enabled Workfow Scheduler). Experimental results show that the proposed algorithmis able to largely make full use of the time fragments introduced during the schedulingprocess, and therefore improves the resource availability and efficiency of processexecution.
Keywords/Search Tags:MRWS, MapReduce workflow, Hadoop, MapReduce, scheduling, heterogeneous cluster
PDF Full Text Request
Related items