Font Size: a A A

Research On Performance Modelling And Optimization Of Hadoop

Posted on:2018-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2348330563952444Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Employing Map/Reduce programming model to process mass data is the mainstream big data computing solution.The core ideal of Map/Reduce platform is to store and compute mass data based on distributed computational nodes,and to realize localization process of data through task scheduling so as to improve the efficiency of data process.Hadoop,as the most typical representative of Map/Reduce platform,has been widely accepted by industrial circle and academic circle currently.The performance optimization of Hadoop platform can shorten the task execution time,improve the job through-put of the platform,and extent the effectiveness of the platform owner.Under continual extension trend of the nodes scale of Hadoop platform in nowadays,the performance optimization of Hadoop platform has become one of the hot topics of big data research.However,given the large scale of Hadoop system stack and complex components interaction,its performance optimization is challenging.And current optimization works of Hadoop platform mainly focus on the adjustment of Hadoop operation configuration parameters and task scheduling strategies,namely focusing on Hadoop operation environment level.It is a kind of partial optimization of Hadoop system stack,which lacks overall system stack performance optimization of comprehensive application level,software system level and software system level.Given above problems,this thesis promptes the Hadoop performance optimization way based on performance modeling.Firstly,the Hadoop H-Roofline model which quantitatively characterizes the mapping relation of Hadoop platform performance and system stack is established based on the Roofline Model which is the typical high performance computing system performance characterization in system domain and the performance characteristics of Hadoop.Then,the mapping relation between Hadoop platform performance and system stack configuration parameters is established based on Hadoop Roofline Model to realize the overall system stack optimization of Hadoop platform.Finally,test and evaluation of the system are conducted.The main contributions of the thesis are as follows:(1)The quantitative performance evaluation model H-Roofline of Hadoop platform is established.Based on its analysis to Hadoop source usage characteristics,H-Roofline employs memory access performance and disk I/O performance which are the major influential factors to its performance as basic variables to characterize the quantitative mapping relations among Hadoop performance,memory access,disk I/O and other relative sources,thus realizing the quantitative characterizations of the securable upper performance limit of the system under different resource allocations.(2)The performance optimization way of Hadoop platform is designed based on H-Roofline.Based on H-Roofline model,the quantitative mapping relation of parameters and upper performance limit is established through employing classification methodology(based on dispersion)to select the parameters which are sensitive to memory access bandwidth,disk I/O bandwidth characteristics,thus realizing the performance optimization covering application level,software system level,and software system level.(3)The accuracy of the performance model is tested.H-Roofline model is employed to conduct performance test and evaluation based on five typical big data benchmark test procedure sets.The results show that,under the guidance of adjustment and optimization methodology,the job execution time of big data workload is shortened at least by 16.1%.
Keywords/Search Tags:big data, Hadoop, performance model, Roofline, parameter optimization
PDF Full Text Request
Related items