Font Size: a A A

A Cost-based Optimizer For Configuration Parameters Of Hadoop

Posted on:2014-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:L X CengFull Text:PDF
GTID:2268330422463449Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
MapReduce was an efficient tool for large-scale data processing. Hadoop, anopen-source implementation of MapReduce, has been widely adopted and experiencedbecause of its scalability and fault-tolerant. However, even to run a single program inHadoop, a number of configuration parameters have to be turned by users or systemadministrators to ensure the efficiency of the program. There are more than190parameters in Hadoop that can control the behavior of a MapReduce job. Because of thelack knowledge of how to set these parameters, users often run into performanceproblems.HCOpt, A Cost-based Optimizer for Configuration Parameters of Hadoop, focus onthe large space of configuration parameters for the optimization of performance for theMapReduce programs, and uses a dynamic bytecode tracing tool to collect monitoringinformation from running MapReduce Programs, a light-weight MapReduce simulator toestimate the performance of a given Hadoop configuration and a genetic-based searchalgorithm to find an optimized configuration in the large search space. By trackingrunning information of MapReduce jobs through dynamic bytecode injection technique,HCOpt minimizes the degree of coupling with Hadoop which make it adapt to the variousversions of Hadoop, and at the same time, Hadoop configuration optimization lead to theperformance improment of Hadoop applications because of the full advantage of thesystem resources.The effectiveness of HCOpt is demonstrated through a comprehensive evaluationusing representative MapReduce programs. The results show that HCOpt reduces the jobcompletion time of these applications by up to50%when compared to the applicationsrun in default configuration, and29%~54%when compared to the applications run inconfiguration that suggested by the Rule-based Optimization.
Keywords/Search Tags:MapReduce, Hadoop, Performance Optimization, Automatic ParameterAdjustment
PDF Full Text Request
Related items