Study On Resource Context And Job Cost-Aware Job Scheduling Optimization For Hadoop Mapreduce Framework

Posted on:2014-09-11

Degree:Master

Type:Thesis

Country:China

Candidate:J S Yan

Full Text:PDF

GTID:2308330482950342

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the rapid development and universal application of information technology, the scale of industrial computing system has been increasing in an astonishing speed, as well as those of a myriad of data generated by these systems. The traditional relational databases systems are no longer able to capture and process such large volume of data. Big data processing techniques have become the urgent needs of real world. In this context, it is a consensus in both industry and academia to adopt parallel computing technique to deal with big data, which means we can process big data in parallel based on a large-scale distributed data storage and parallel computing platform.The MapReduce technique, originally published by Google, has become the most successful one to process big data for its high scalability and ease of use. Hadoop, the mainstream open-source implementation of Google MapReduce, has been the actual industrial standard of big data processing. However, the current implementation is overly suitable for large scale batch processing; the high response demand from many real applications, like online data processing or queries, is ignored. The targeted performance optimization of MapReduce framework is a hot technical problem that the researchers concerned about.To improve performance of MapReduce, we dived into its execution framework, and have made some targeted optimization. The main contributions are the following two points:(1) The degree of parallelism, which in Hadoop is persistent by a parameter that named as slot since the system begins to work, is a key factor of parallel computing. The system would be in waste of resource while executing light computing tasks, or be in exhaustion of resource with heavy tasks. In view of this situation, we designed and implemented optimization for MapReduce framework by resource context aware to allocate different numbers of task dynamically.(2) Job scheduler is an important component in Hadoop. But most mainstreaming scheduling algorithms don’t allocate jobs in balance based on their feature of resource cost. It is inevitable for different nodes to endure different kinds of resource in exhaustion. We make a targeted job scheduling algorithm based on the feature of resource cost by different jobs to address this situation.At last, we use benchmarks to value the performance improvement of our optimization respectively. The experimental results show that our designs are effective.

Keywords/Search Tags:

big data, parallel computing, MapReduce, performance optimization, resource context aware, job cost, job scheduler

PDF Full Text Request

Related items

1	Performance-Aware Scheduling For Data-Intensive Cloud Computing
2	Research On Data-Aware Scheduling Strategies Of MapReduce Jobs
3	Research On Resource-aware Skew Mitigation For Mapreduce
4	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
5	Researches On Optimization Of Resource Allocation For MapReduce Scheduling
6	The Application Case Study Of Mapreduce Parallel Computation And The Optimization Of Its Runtime Framework
7	High performance integration of data parallel file systems and computing: Optimizing MapReduce
8	Research On The Energy-aware Scheduler For Hadoop
9	Thermal Aware Scheduling in Hadoop MapReduce Framework
10	Research On Flexible Task Scheduling In Large-scale Data Parallel Processing Applications