Font Size: a A A

Research On Automatic Optimization Of Hadoop Parameters

Posted on:2015-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:F P ChenFull Text:PDF
GTID:2428330488499702Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Apache Hadoop is a popular open-source software framework which supports distributed processing of large data sets.The performance of Hadoop involves many factors,such as cluster configuration,the property of the running job,task scheduling and parameters configuration.Parameters configuration is one of the key factors which influence the performance.Reasonable parameter configuration will improve the performance of Hadoop to a great degree.This article introduced the basic framework and operating mechanism of Hadoop,and the status of research and then summarized some deficiencies of some available optimizing approach.The traditional brute force search method and optimizing models built on such method cannot figure out why or how some parameter improves the performance of Hadoop.The difference of Hadoop function-call information reflects the nature of the difference of the performance of Hadoop to a certain degree.From this point of view,this article presents an automated optimization model based on the feedback of function call information monitoring.This model profiles the characteristic of a job and optimizes the parameters automatically.If it fails to fetch the optimal parameters from the database of the model,then the model will sample the input of the job with a sampling algorithm and use a heuristic optimizing method based on the feedback of function call information monitoring to optimize these parameters.This heuristic optimizing method will analysis the root cause of the difference of the performance of Hadoop and figure out corresponding prioritization scheme.To monitor function call information,this article analysed the function call mechanism.To monitor the function call information of different phase respectively,this article divided the task running process into different phase.Monitoring the function call information without directly modifying the source code of Hadoop was achieved by importing the aspect-oriented program technology.In the end,this article proved the effectiveness of this model by some example.The automated optimization model present in this article analyzes the root cause of the difference of the performance of Hadoop from the view of function call information and figures out the corresponding effective prioritization scheme,which reduces the blindness of parameter-optimizing experiments.The results of some examples show that with this automated optimization model,it is convenient to figure out how a particular parameter improves the performance of Hadoop and make corresponding efficient prioritization scheme.The performance improved at least by 20%after optimizing some limited parameters.
Keywords/Search Tags:Hadoop, Hadoop performance monitoring, Hadoop performance optimization, Hadoop parameters optimization, Hadoop function monitoring
PDF Full Text Request
Related items