Font Size: a A A

Performance Prediction And Optimization Of Apache Spark Based On SRFRP Model

Posted on:2018-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:K X WuFull Text:PDF
GTID:2348330533969823Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the wide utilization of large data processing fr amework,large data processing application performance prediction is more and more needed.Spark is a large data processing framework based on distributed memory computing,which has been widely recognized in the industry with its faster processing speed,good scalability and fault tolerance.However,the execution time of the Spark load varies greatly depending on the size of the input data,the design of the algorithm,the cluster computing ability,and the configuration of the cluster.This makes the Spark performance prediction a big problem.At present,the methods of Spark performance prediction are mainly machine learning and system behavior modeling.However,these methods have the problems of poor universality and low accuracy.In this paper,a Spark performance prediction algorithm based on random forest regression model and map editing distance is proposed to overcome the general problem of machine learning method and the accuracy problem of system behavior modeling In this paper,the factors that affect the performance of Spark load are divided into two categories: static characteristics and dynamic characteristics.Taking the static characteristic factor as the input and the workload execution time as the output,the performance prediction model o f this type of load-the SRFRP model is constructed by using the historical operation information of the fixed type load and the random forest regression algorithm.The SRFRP model library is generated by training historical operation information of multi ple types of loads.Then,according to the workload dynamic feature-DAG diagram and the load similarity calculation method proposed in this paper,the corresponding SRFRP model is matched from the SRFRP model library,and the predictive load is predicted by the random forest method proposed in this paper.Based on the performance prediction algorithm,the Spark performance prediction and optimization system is realized.The system has the functions of predicting and optimizing the Spark workload performance.Experiments are performed in a 41-nodes cluster.The experimental results show that the Spark performance prediction method based on SRFRP model can accurately predict the performance of various Spark workloads.The Spark performance and optimization system can make the load Efficiency increased by 20% or more.Therefore,this article is not only for a variety of load performance prediction and optimization,but also for users to optimize th e efficiency of the cluster.
Keywords/Search Tags:Spark, dynamic feature, performance prediction, static feature, random forest
PDF Full Text Request
Related items