Font Size: a A A

Research On Performance Optimization And Parameter Configuration Strategy Of Spark Platform

Posted on:2021-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:T W FanFull Text:PDF
GTID:2428330614458464Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The advent of the data era brings about a deepening perception in various industries about data information resources.Hence,every industry is bound to face the matter of how to deal with data information more quickly and accurately.As a result,a distributed large-scale framework of data processing and computing arises.However,the computing platform of Spark,which has numerous configuration parameters,tends to be manually configured and modified according to the service experience and the given business scenarios.Therefore,the optimal cluster performance cannot be achieved when Spark is used.Even though the Spark framework provides two solutions,FIFO and FAIR,some matters,such as memory overflow,caused by improper allocation of memory resources under some extreme circumstances are not taken into consideration,which might result in degraded cluster performance and wasteful cluster resources.In view of the aforementioned problems,this thesis is divided into two partsThe first part refers to in-depth research and analyzes about the influences of the configuration parameter value of Spark platform on the cluster performance.Related references are consulted,and hence,the parameter value is configured with the guidance of black box principle.At the same time,the lightGBM-based performance model of Spark platform's configuration parameter is proposed,which can automatically select corresponding configuration parameter value according to the size of historical running data and input data,in ways that enable the cluster performance to meet the needs of different business scenarios.This thesis deeply analyzes the Bayesian optimization algorithm and uses it to establish the configuration parameter performance model.Hence,the model can become efficient enough to meet more business needs and consequently the optimal model performance can be achieved.The configuration of parameter value,the cluster performance and the execution efficiency can be improved by the model established by analyzing and verifying experimental data in this thesisThe second part analyzes the memory allocation method of Spark platform,and finds that when the data size and data type of the task are not reasonable,the memory allocation method appears overflow exception and other deficiencies,and a memory optimization strategy based on long and short jobs is proposed.The strategy consists of calculating Task feedback weights,memory allocation based on feedback weights,and multi-level feedback scheduling methods for tasks.The memory optimization strategy divides Tasks into long-and short-tern tasks according to the speed and length of data reading and writing,hence the feedback weight and priority of tasks are calculated jointly at the local scheduling level of tasks.Then,the memory space is allocated on the basis of the feedback weight and the Task is consequently executed with the help of scheduling strategy.The use of uneven length-based operation data proves that the proposed memory optimization strategy in this thesis can properly allocate memory resources to a greater extent.
Keywords/Search Tags:Spark, lightGBM, Memory Optimization, Feedback Weight, Performance Model
PDF Full Text Request
Related items