Font Size: a A A

The Performance Optimization Of MapReduce In Cloud Computing

Posted on:2018-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q DongFull Text:PDF
GTID:2348330536479650Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and popularization of internet technology,large-scale data processing requirements are increasing,and traditional parallel computers are difficult to provide enough storage space and computing resources to process.Cloud computing technology provides a good environment for solving massive data processing.MapReduce is a distributed computing model for massive data processing in cloud computing,which simplifies the development of distributed parallel programs.Hadoop is MapReduce open source implementation,with the ability to handle large amounts of data.However,there are still some problems in the performance of the MapReduce programming model,so this paper optimizes the performance of MapReduce from the perspective of task scheduling.This main contents of the thesis: Firstly,this thesis introduces the cloud computing and Hadoop platform,focuses on the MapReduce computing model and the task scheduling algorithm.Based on the analysis of the shortage of MapReduce scheduling algorithm,this thesis proposes a new scheduling algorithm ant colony simulated annealing algorithm.In this algorithm,the local optimal solution is constructed according to the ant colony algorithm to reduce the task completion time and ensure the resource load balance.The local optimal solution is simulated by the simulated annealing algorithm.Annealing algorithm to local search and to a certain probability to accept the current search results,so as to avoid the algorithm into a local optimal.In addition,based on the shortcomings of the fault-tolerant technology in the MapReduce programming model,this thesis proposes a reliability task scheduling strategy that introduces the failure recovery mechanism,evaluates the trustworthiness of the resource nodes in the cloud environment,builds the trustworthiness model and avoids the task assignment to reliability Low nodes,resulting in re-implementation of the task,a waste of time and resources.The validity and stability of the proposed task scheduling algorithm and scheduling model are verified by the simulation platform CloudSim.
Keywords/Search Tags:Cloud computing, MapReduce, performance optimization, task scheduling
PDF Full Text Request
Related items