Research And Optimization On Resource Usage And Allocation Strategy For Spark

Posted on:2019-09-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2428330563492466

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the advent of the �Internet Plus� era,big data technology has become a hot topic in all walks of life today.The memory-based distributed computing framework Spark has many excellent features such as fast execution,strong generality and so on,which makes Spark widely concerned and applied.However,the default resource allocation strategy of Spark is a static method that cannot meet the needs of different applications.Whether to share cluster resources with others in production environment,or to efficiently use rental service and reduce costs in cloud platform,Spark users need to optimize the allocation and usage of cluster resources under the premise of meeting application requirements.How to use the excellent features of Spark to develop applications and optimize their performance,and how to reasonably allocate cluster resources are challenges faced by users.Based on the study of the operating mechanism and the resource management of Spark,two optimization methods for Spark application are proposed:(a)Optimizing the RDD persistence strategy,and(b)Optimizing the parallelism of application.Selecting BigDataBench as the test loads,performance optimization is performed for three typical iterative applications ALS,KMeans and PageRank,which are widely applied in popular Internet services such as search engine,social networks,and e-commerce.According to the resource utilization characteristics that the memory requirements of Spark iterative applications always converge quickly,modeling their resource usage,Resource Dynamic Feedback Scheduling(RDFS)is designed and implemented to predict the resource usage of iterative applications and optimize the allocation of the cluster resources.The memory usage of Spark is analyzed,and the impact of different RDD storage level on Spark applications is explored when memory cannot meet the requirements,so as to optimize the memory usage of Spark applications and improve their execution efficiency.The test results show that the execution time of the three optimized Spark typical iterative applications is greatly shortened,and the execution efficiency is significantly improved.RDFS ensures that the iterative applications run normally and efficiently,at the same time RDFS can improve the overall utilization rate of resources by releasing the redundant system resources,and can shorten the concurrent execution time of multiple applications.In absence of memory,there is no deterministic persistence method that can guarantee the optimal execution efficiency of the applications.In some cases,a proper storage level can improve the execution efficiency by more than 70 percent.

Keywords/Search Tags:

Big Data, Distributed Computing, Resource Dynamic Feedback Scheduling, Storage Level

PDF Full Text Request

Related items

1	Distributed Computing Application Oroented Resource Scheduling Mechanisms In Optically Interconnected Data Center
2	Low Delay Communication And Computation Resource Scheduling For Heterogeneous Distributed Computing
3	Design And Implementation Of A Two Level Resource Scheduler Based On Kubernetes-on-EGO
4	Research And Implementation Of Data Centers Resource Dynamic Scheduling Method
5	Research On Multi Cloud Dynamic Security Storage Technology
6	Analysis of resource scheduling strategies in parallel, distributed and grid computing systems
7	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
8	Research On The Key Technology Of Dynamic Optimal Scheduling Of Multidimensional Resources In Cloud Computing Environment
9	Design And Implementation Of Scheduling Strategy For Distributed Resource Scheduling Platform
10	Greedy Algorithm Based Cloud Computing Resource Scheduling Strategy