Font Size: a A A

Study On Resource Context And Job Cost-Aware Job Scheduling Optimization For Hadoop Mapreduce Framework

Posted on:2014-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:J S YanFull Text:PDF
GTID:2308330482950342Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development and universal application of information technology, the scale of industrial computing system has been increasing in an astonishing speed, as well as those of a myriad of data generated by these systems. The traditional relational databases systems are no longer able to capture and process such large volume of data. Big data processing techniques have become the urgent needs of real world. In this context, it is a consensus in both industry and academia to adopt parallel computing technique to deal with big data, which means we can process big data in parallel based on a large-scale distributed data storage and parallel computing platform.The MapReduce technique, originally published by Google, has become the most successful one to process big data for its high scalability and ease of use. Hadoop, the mainstream open-source implementation of Google MapReduce, has been the actual industrial standard of big data processing. However, the current implementation is overly suitable for large scale batch processing; the high response demand from many real applications, like online data processing or queries, is ignored. The targeted performance optimization of MapReduce framework is a hot technical problem that the researchers concerned about.To improve performance of MapReduce, we dived into its execution framework, and have made some targeted optimization. The main contributions are the following two points:(1) The degree of parallelism, which in Hadoop is persistent by a parameter that named as slot since the system begins to work, is a key factor of parallel computing. The system would be in waste of resource while executing light computing tasks, or be in exhaustion of resource with heavy tasks. In view of this situation, we designed and implemented optimization for MapReduce framework by resource context aware to allocate different numbers of task dynamically.(2) Job scheduler is an important component in Hadoop. But most mainstreaming scheduling algorithms don’t allocate jobs in balance based on their feature of resource cost. It is inevitable for different nodes to endure different kinds of resource in exhaustion. We make a targeted job scheduling algorithm based on the feature of resource cost by different jobs to address this situation.At last, we use benchmarks to value the performance improvement of our optimization respectively. The experimental results show that our designs are effective.
Keywords/Search Tags:big data, parallel computing, MapReduce, performance optimization, resource context aware, job cost, job scheduler
PDF Full Text Request
Related items