Font Size: a A A

Research On In-memory Optimization Technologies Based On Machines Learning Techniques

Posted on:2018-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:N LuoFull Text:PDF
GTID:2348330566955730Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Recently,in-memory cluster computing(IMC)gains momentum because it accelerates traditional on-disk cluster computing(ODC)up to several tens of times for iterative and interaction applications.The most popular IMC framework is Spark and it has more than 100 configuration parameters.However,it is unclear how significantly these parameters affect the system performance because IMC is a quite new computing paradigm.Consequently,there is yet no study addressing how to optimally configure IMC frameworks.In this paper,we first investigate how significantly the configuration parameters affect the performance of Spark workloads.We find that the configuration caused performance variation can be as large as 20.7,indicating configuring Spark workloads is extremely important to their performance.However,manually configuring Spark workloads is notoriously difficult because there are so many configuration parameters which might interfere with each other in a complex way.To address this issue,we propose an approach to Automatically Configure Spark workloads,named ACS.It firstly constructs performance models as functions of Spark configuration parameters by using random forest which is an ensemble learning algorithm.Subsequently,ACS leverages genetic algorithm to search the optimum configuration by taking configurations and the corresponding performance predicted by the performance models as inputs.We employ six Spark programs,each with five input data sets to evaluate the performance improvements.The results show that ACS speeds up the 30 programinput pairs by a factor of 2.2X on average and up to 8.2X.In addition,the performance improvements obtained by ACS increases along with the increments of the input data set sizes of Spark workloads,which is a nice property for big data analytics.
Keywords/Search Tags:cluster computing, distributed system, in-memory cluster computing, machine learning
PDF Full Text Request
Related items