Font Size: a A A

Research On MapReduce Scheduler For Iterative Applications

Posted on:2016-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2348330479953425Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
MapReduce is a popular distributed computing framework for massive-scale data-intensive processing which has good scalability and flexibility. The design of MapReduce framework focuses on performing data processing in a single pass with one MapReduce job, and it dose not provide explicit support for iterative or recursive types of analysis. Aiming at this problem, many researchers proposed improvements. However, the problem of load imbalance exists in most research because of their fixed task load allocation.We analysis the main reason of the load imbalance and the characteristics of iterative applications, and then we build a feedback mechanism between iterations to solve the problem. This paper presents a feedback based scheduler(FBS) for iterative applications, and improves the performance by adjusting its task load allocating scheme dynamically between iterations.In addition, the ineffectiveness of speculative execution of Hadoop is a long-recognized problem. We analyze its reason through experiments, and we find the main reason is that Hadoop does not consider the different features of tasks and the capability of computing nodes which result in the imprecise progress estimation. Then, we present corresponding improvements for iterative applications.Finally, we evaluate our work on real applications and real-world datasets. Compared with HaLoop, on average, we reduce over 60% runtime of each iteration. And our approach can estimate the progress of tasks more accurately.
Keywords/Search Tags:MapReduce, Hadoop, Iterative Application, Speculative Execution, Task Scheduling
PDF Full Text Request
Related items