Research On Performance Optimization And Parameter Configuration Strategy Of Spark Platform

Posted on:2021-04-18

Degree:Master

Type:Thesis

Country:China

Candidate:T W Fan

Full Text:PDF

GTID:2428330614458464

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The advent of the data era brings about a deepening perception in various industries about data information resources.Hence,every industry is bound to face the matter of how to deal with data information more quickly and accurately.As a result,a distributed large-scale framework of data processing and computing arises.However,the computing platform of Spark,which has numerous configuration parameters,tends to be manually configured and modified according to the service experience and the given business scenarios.Therefore,the optimal cluster performance cannot be achieved when Spark is used.Even though the Spark framework provides two solutions,FIFO and FAIR,some matters,such as memory overflow,caused by improper allocation of memory resources under some extreme circumstances are not taken into consideration,which might result in degraded cluster performance and wasteful cluster resources.In view of the aforementioned problems,this thesis is divided into two partsThe first part refers to in-depth research and analyzes about the influences of the configuration parameter value of Spark platform on the cluster performance.Related references are consulted,and hence,the parameter value is configured with the guidance of black box principle.At the same time,the lightGBM-based performance model of Spark platform's configuration parameter is proposed,which can automatically select corresponding configuration parameter value according to the size of historical running data and input data,in ways that enable the cluster performance to meet the needs of different business scenarios.This thesis deeply analyzes the Bayesian optimization algorithm and uses it to establish the configuration parameter performance model.Hence,the model can become efficient enough to meet more business needs and consequently the optimal model performance can be achieved.The configuration of parameter value,the cluster performance and the execution efficiency can be improved by the model established by analyzing and verifying experimental data in this thesisThe second part analyzes the memory allocation method of Spark platform,and finds that when the data size and data type of the task are not reasonable,the memory allocation method appears overflow exception and other deficiencies,and a memory optimization strategy based on long and short jobs is proposed.The strategy consists of calculating Task feedback weights,memory allocation based on feedback weights,and multi-level feedback scheduling methods for tasks.The memory optimization strategy divides Tasks into long-and short-tern tasks according to the speed and length of data reading and writing,hence the feedback weight and priority of tasks are calculated jointly at the local scheduling level of tasks.Then,the memory space is allocated on the basis of the feedback weight and the Task is consequently executed with the help of scheduling strategy.The use of uneven length-based operation data proves that the proposed memory optimization strategy in this thesis can properly allocate memory resources to a greater extent.

Keywords/Search Tags:

Spark, lightGBM, Memory Optimization, Feedback Weight, Performance Model

PDF Full Text Request

Related items

1	Research On Memory Optimization Algorithm Based On Weight Priority Task Scheduling Strategy In Spark Platform
2	Research On Significant Technologies Of Performance Optimization On In-memory Computing Framework
3	Research On Memory Optimization Technology Of Spark Computing Engine
4	Research And Implementation Of Performance Modeling And Optimization Technology Of Spark Computing Framework
5	Design And Implementation Of O2O Coupon Usage Forecast System Based On LightGBM
6	The Implementation Of Remote-Memory Management System And Performance Optimization In Spark
7	Research On Job Scheduling And Memory Cache Optimization Based On SPARK
8	Research On Spark Performance Optimization Technology For In-Memory Computing
9	Research And Implementation Of Spark Performance Optimization For Police Data Processing
10	Performance Prediction And Optimization For Apache Spark Platform