Font Size: a A A

The Analysis And Optimization Of Hadoop Data Processing Performance On Parallel Computing

Posted on:2015-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:J L YaoFull Text:PDF
GTID:2298330467972348Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development and popularization of new generation mobile communication, Internetof Things, and Cloud Computing, data traffic shows explosive growth with increasingly largepressure on data processing. By virtue of its powerful data processing capability, HadoopMapReduce programming framework has become more mature solutions in the field of textanalysis, natural language processing and business data processing.It can meet the data processingbottle-neck of communicating system. But the lack of cost-based optimization of parameters inMapReduce frameworks becomes a major limiting factor as MapReduce usage grows beyond largeWeb companies to new applications. About13of all200parameters have major effects on thecluster’s performance. Around the above problems, we design a new parameters configurationanalysis system based on the Hadoop tunning in this thesis. Every single task will have theoptimized parameters to improve the performance.In this thesis, based on the framework of MapReduce, we propose three new components:Profiler, Judge-Engine and Cost-based Optimizer. The Profiler is designed to collect detailedstatistical information from unmodified MapReduce programs; The Judge-Engine works for thefine-grained cost estimation; The Cost-based Optimizer provide the best and simplified parametersbased on the ouput of other two components.Through the comparisions with optimized parameters and default parameters in MapReduce’stypical applications: text analysis, natural language processing and business data processing.Wehave proved the the effectiveness of each component through a comprehensive evaluation fromrepresentative MapReduce application domains. The result shows that with help of theses three newcomponents, the new optimization model makes Hadoop parameters’ optimization much easier.
Keywords/Search Tags:Hadoop, Performance Optimization, Parameters, MapReduce
PDF Full Text Request
Related items