Font Size: a A A

Research On Workload-specific Memory Configuration Of Spark Workloads

Posted on:2019-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:S L ChangFull Text:PDF
GTID:2428330593950168Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Spark,the distributed memory computing platform,is the latest technological advancement for big data processing.Memory resources are the core resources of the Spark platform.A proper memory configuration can provide performance guarantees for the operation of the Spark workloads.The memory configuration of Spark refers to the allocation and management of memory resources for Spark workloads based on load characteristics and platform features.Currently,the memory configuration of Spark is a user-oriented static configuration.Due to the fact that the user lacks knowledge of the Spark platform mechanism,he usually overloads memory resources,which results in low usage efficiency of memory resources and reduces workloads' execution efficiency in multiple-workload scenarios.Aiming at the current insufficiency of Spark memory configuration methods,this paper proposes a workload-specific method of Spark memory configuration.Based on the quantitative analysis of the access characteristics of Spark load memory,the prediction model of the load memory requirement is built on different categories to realize precise configuration of the Spark memory.(1)proposing a method framework of the workload-specific Spark memory configuration.This paper verifies that the Spark workloads is categorizable through quantum experiment.Based on this conclusion,the proposed method framework contains two major stages,the offline stage and the online stage.At the offline stage,the Spark workload is categorized in terms of the memory requirement,and then the Spark workload memory requirement models are built respectively for each workload category.At the online stage,small testing data set is used to perform workload category matching,and the workload's memory requirement is assessed according to the prediction model of the memory requirement in the category.(2)Proposing an experience-based Spark memory configuration method.This method proposes the "data expansion rate" indicator as the classification indicator of the Spark workload memory access characteristics,and uses the classification indicator and its change rate to classify Spark workload;based on the classification,the method builds the prediction models of the memory requirement according to the workloads' memory requirement empiric computational formula of the Spark benchmark test workloads.(3)Proposing a memory configuration method based on machine learning.This method first selects the Spark load memory access characteristic index by selecting and streamlining the Spark hardware and software system stack;second,according to the characteristic index above,the K-Medoid algorithm is adopted to classify the typical Spark workload;third,for each type of workload,a stepwise regression method is adopted to automatically filter the platform configuration parameters that have a significant impact on the load requirements of the category,and then a support vector machine(SVM)-based regression algorithm is adopted to build the forecasting models of the Spark workload's memory requirement respectively for each type of workload.(4)Finally,we adopt a typical Spark workload set to evaluate the performance of the two memory configuration methods.Experimental results show that compared with the default static configuration of the system,both methods achieve better load execution efficiency and memory resource utilization.Of the two methods,the memory configuration method based on machine learning has more accurate memory requirement prediction and higher memory utilization.
Keywords/Search Tags:big data, in-memory computing platform, Spark, memory configuration, machine learning
PDF Full Text Request
Related items