Font Size: a A A

Research And Optimization On Resource Usage And Allocation Strategy For Spark

Posted on:2019-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2428330563492466Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the advent of the “Internet Plus” era,big data technology has become a hot topic in all walks of life today.The memory-based distributed computing framework Spark has many excellent features such as fast execution,strong generality and so on,which makes Spark widely concerned and applied.However,the default resource allocation strategy of Spark is a static method that cannot meet the needs of different applications.Whether to share cluster resources with others in production environment,or to efficiently use rental service and reduce costs in cloud platform,Spark users need to optimize the allocation and usage of cluster resources under the premise of meeting application requirements.How to use the excellent features of Spark to develop applications and optimize their performance,and how to reasonably allocate cluster resources are challenges faced by users.Based on the study of the operating mechanism and the resource management of Spark,two optimization methods for Spark application are proposed:(a)Optimizing the RDD persistence strategy,and(b)Optimizing the parallelism of application.Selecting BigDataBench as the test loads,performance optimization is performed for three typical iterative applications ALS,KMeans and PageRank,which are widely applied in popular Internet services such as search engine,social networks,and e-commerce.According to the resource utilization characteristics that the memory requirements of Spark iterative applications always converge quickly,modeling their resource usage,Resource Dynamic Feedback Scheduling(RDFS)is designed and implemented to predict the resource usage of iterative applications and optimize the allocation of the cluster resources.The memory usage of Spark is analyzed,and the impact of different RDD storage level on Spark applications is explored when memory cannot meet the requirements,so as to optimize the memory usage of Spark applications and improve their execution efficiency.The test results show that the execution time of the three optimized Spark typical iterative applications is greatly shortened,and the execution efficiency is significantly improved.RDFS ensures that the iterative applications run normally and efficiently,at the same time RDFS can improve the overall utilization rate of resources by releasing the redundant system resources,and can shorten the concurrent execution time of multiple applications.In absence of memory,there is no deterministic persistence method that can guarantee the optimal execution efficiency of the applications.In some cases,a proper storage level can improve the execution efficiency by more than 70 percent.
Keywords/Search Tags:Big Data, Distributed Computing, Resource Dynamic Feedback Scheduling, Storage Level
PDF Full Text Request
Related items