Font Size: a A A

Computing Resource Utilization Analysis And Multi Job Scheduling Algorithm Design Of MapReduce/Hadoop

Posted on:2017-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L YuFull Text:PDF
GTID:2308330485482227Subject:Software engineering
Abstract/Summary:PDF Full Text Request
New businesses such as Internet of things, social networking and intelligent equipment have accumulated a massive amount of data in our life with the rapid development of Internet technology. The relationship between these rapid accumulation of big data and cloud computing is like two side of a coin. Big data must be processed by distributed parallel computers instead of single computer. MapReduce is a currently popular programming model to support parallel computations. Most users tend to focus on accelerating the processing speed of the data when using Hadoop Big Data processing. More attention is paid to resource utilization from the system point of view.On the basis of research at home and abroad on the MapReduce programming model, we here address the challenge of designing analytical models to estimate the performance of MapReduce. Moreover, we propose a new optimization scheduling algorithm to provide the theoretical basis and algorithm to support more efficient processing of big data.There are two main innovations in this paper.Firstly, the paper aims to investigate the analytical model by adopting queueing theory in data center of big data. The new queueing model developed fits the MapReduce programming model accurately and discovers the nature of the programming model. The utilizations and mean waiting times of Mapper and Reducer are obtained respectively. The effect of workload on the system performance (i.e., utilization) is exposed. The significance of this part is it explores the theoretical insight of the MapReduce programming model and provides the optimal parameter recommendation for computing resource configuration. Moreover, by virtue of the developed analytical model, we can tune the system parameters and adjust the workload to improve system utilization.Secondly, since today’s data centers run many MapReduce jobs in parallel, it is important to find a good scheduling algorithm that can optimize the completion times of these jobs. We devise FIFO algorithm to arrive at a good ordering of jobs to minimize the overall job completion times and the gap of the system. Using simulations, we also compare our scheduling algorithm with standard scheduling strategy such as FIFO and show that our scheduling algorithm improve the system performance efficiently.
Keywords/Search Tags:MapReduce, Analytical Model, Workloads, Resource Utilization, Scheduling Algorithm
PDF Full Text Request
Related items