Font Size: a A A

Application Characteristics Analysis And Performance Optimization For Hadoop System

Posted on:2018-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:J R WangFull Text:PDF
GTID:2348330518971074Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Hadoop MapReduce is the most widely-used big-data system nowadays.Although Hadoop provides an efficient solution for large-scale distributed data processing,it still faces a number of challenges:1)Hadoop's abstract programming interface conceals the underlying implementation details and makes it difficult to perform performance analysis of the application;2)Hadoop system configuration parameters have a significant impact on system performance,the default configuration scheme however can not guarantee that all applications run the best;3)frequent data movement between the processing unit and memory is becoming one of the most critical performance bottlenecks in big-data system,new solutions are needed to reduce the negative impact of data movement.This paper focuses on the characterization analysis and performance optimization of applications under Hadoop system.Firstly,we propose and develop a lightweight,non-intrusive distributed Hadoop performance analysis framework based on bytecode dynamic instrumentation.It can dynamically capture the application's runtime details,help users understand the application's performance characteristics,and then shed insight on application optimization.Secondly,to tune the Hadoop configuration parameters,we present a Hadoop application performance model for dynamic resource allocation scenarios and use the genetic algorithm to explore the global high-dimensional configuration space.The performance model is able to predict the execution time of a MapReduce application with an error rate of less than 6%,on average.Compared to the default configuration schema,the system performance can be improved by 9.52X on average and up to 18.76X using the proposed tuning method.Finally,we propose a near-data processing(NDP)system,which exploits the data parallelization of MapReduce application.We develop the hardware/software interface,dynamic task offload mechanism and runtime environment,which hide all the details of NDP hardware.We also implement a lightweight MapReduce framework which supports the migration of Map tasks and Reduce tasks to NDP units.Compared to the baseline system without near-data processing,the proposed near-data processing system can achieve 4.83X performance improvement and 26%system energy reduction;compared to SMC,which doesn't support parallel data processing,the proposed near-data processing system can achieve 2.32X performance improvement at the expense of 37%increment in system power consumption.
Keywords/Search Tags:Hadoop, MapReduce, Application Characteristics, Optimization, Near Data Processing
PDF Full Text Request
Related items