Font Size: a A A

Research On Job Scheduling In Spark Platform For Workloadmixing Data Center

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2518306470469354Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With workload mixture technology,the relieved resource fragments,which is generated due to the resource utilization fluctuations of latency-sensitive applications,can be regranted to the batch applications.Statistical results prove that the workload mixture technology do great help to improve resource utilization of the production data center.Spark is the main host platform for batch applications in workload mixture technology.Besides guaranteeing the quality of service of latency-sensitive applications,how to improve the throughput of Spark batch applications in the situation of fluctuant resource provision,is yet an open issue in data center job scheduling.To solve the above problems,a Spark job scheduling strategy is proposed for workload mixture in this paper.The proposed strategy aims to optimize the quality of service of Spark batch applications and the resource utilization of data center.The main contributions of this paper are summarized as follows:(1)The feasiblity of Spark workload classification on their execution time is analyzed quantitatively.Based on the representative Spark batch application,the influence of input data size and resource allocation on the application execution time is quantitatively demonstrated.The results showed that Spark batch applications on the execution time is significantly different under the same input data size and resource allocation,and the classification of Spark batch application on their execution time is stable across different input data size or resource allocation.As a result,it is verified that the Spark batch application on their execution time is classifiable.(2)An execution time prediction method for Spark batch applications is proposed,which takes into account the execution time across batch application.Firstly,Spelman correlation coefficient and Mean Shift are adopt to classify the execution time of Spark batch applications.In addition,PCA and GBDT algorithm are adopt to predict the execution time for each application category.Finally,when the ad-hoc application arrives,it is mapped into a specific application category and its execution time is predicted with the corresponding prediction model.The experimental results showed that,compared with the unified prediction method,the classification-based Spark batch application execution time prediction method proposed in this paper can reduce the root mean square error and average absolute percentage error by 32.1% and 32.9% on average.(3)A Spark job scheduling strategy for workload mixture is proposed.The strategy conducts job scheduling based on Spark batch application execution time prediction,and aims to maximize the proportion of Spark batch applications,that satisfy their soft real-time QOS,and the resource utilization of data center.TSPSO algorithm is adopt to optimize the Spark batch job scheduling strategy for workload mixture.The experimental results showed that,compared with the FIFO,Fair and DRF job scheduling strategies commonly used in Spark,the scheduling strategy proposed in this paper can make Spark batch applications achieve the proportion of soft real-time,memory resource utilization and CPU resource utilization are 25%,32.5% and 23.9% higher on average.
Keywords/Search Tags:Spark, Workload Mixture, Job Scheduling, Classification, Prediction
PDF Full Text Request
Related items