Font Size: a A A

Research On Optimal Allocation Strategy Of Spark Resources Based On Performance Prediction

Posted on:2018-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y C HuangFull Text:PDF
GTID:2348330533469804Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Spark has become the most popular distributed big data computing platform.Because of its high performance,good fault tolerance and integrity,it has been widely used in the industry.But due to the spark platform is transparent to the user,Spark running job is influenced by many factors,such as data partition strategy,design and Realization of algorithm and node resource allocation etc..Making prediction of Spark performance is difficult.This paper establish a performance model based on the structure of the Spark job,study the execution time of Spark application in different amount of data and partition strategy situation.On the basis for such,we propose the optimization strategy for resource allocation based on dynamic partition to find trade-offs between running time and resource consumption.Based on the fine-grained monitoring of cluster resources,we establish the performance model of Spark based on the structure of the Spark job,we train the model parameters through the history of a large scale of experimental data.On this basis,we study the effects of Spark partitioning strategy we found that despite the increase in temperature can enhance the performance of parallel computing jobs in a certain extent of the nodes,but in some cases,to improve the performance and resource consumption of the new amplitude comparison,can be considered very little,when we have met the needs of users in the task running time,the small increase in performance can be ignored,accordingly,we should give users time requirements as far as possible to reduce the allocation of resources,to achieve the purpose of saving resources.We will look for the best partitioning scheme for tasks by adding dynamic repartitioning in a series of actual Spark computing jobs,and propose a optimal repartitioning strategy based on running time prediction.On the premise of sacrificing the operating time of multiple tasks,we save cluster resources,find the balance between execution time and resource consumption,we guide users to make rational use of cluster resources for Spark applications.In this paper,the rationality of the performance model and the prediction accuracy of the job execution time are verified by experiments,and good prediction accuracy is achieved.We propose resource allocation strategies based on performance prediction,find the resource allocation strategy optimization,in order to obtain the resource consumption savings.Experimental results show that our optimization strategy can significantly save resources in the execution time given by users,and find a good balance between execution time and resource consumption level.
Keywords/Search Tags:Spark, Execution Time Prediction, Resource Allocation Optimization
PDF Full Text Request
Related items