Research On Job Scheduling And Memory Cache Optimization Based On SPARK

Posted on:2020-08-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y P Zhang

Full Text:PDF

GTID:2428330575975782

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of cloud computing and big data technology,the Spark,as a large data processing framework based on memory computing,has been widely used.To improve the efficiency of task execution,the research on this platform has become a hot spot.In Spark,the data caching,reading and calculation are all carried out in memory,which greatly reduces the time of data transfer between memory and hard disk and improves the execution efficiency of tasks.However,to further improve the computational performance of Spark,exploring efficient job scheduling algorithm and making more efficient use of memory resources are still two hot issues to be solved.Therefore,for the improvement on the job scheduling algorithm and the memory usage mechanism of the Spark platform,we make corresponding research and the main research contents of this paper are as follows:(1)The research on Spark job scheduling based on genetic and tabu algorithm This paper adopts Spark On Yarn deployment mode and proposes a new job scheduling scheme for the shortcomings of several scheduling algorithms in the Yarn mode.By studying the evolution of genetic algorithm population,we proposed an improved optimal preservation strategy and a Modified Adaptive genetic algorithm(MAGA)for crossover and mutation operations.Furthermore,by merging MAGA and tabu algorithm,a Spark job scheduling scheme was proposed to modify the adaptive genetic tabu algorithm.Finally,it is proved that the job scheduling algorithm can effectively reduce task execution time and improve task performance.(2)An research and improvement of memory cache management based on SparkRDD is a unique abstract data model of Spark.Aiming at the selection of RDD cache and the improvement of LRU replacement algorithm,this paper proposes the RDD cache prediction mechanism,and proposes the weight model and weight update mechanism based on RDD partition characteristics by introducing entropy method,so as to achieve the goal of optimizing memory utilization.Finally,by building the cluster environment of Hadoop and Spark,we use the model of Spark On Yarn to compare the above two improvements.First,aiming at the modified adaptive genetic tabu algorithm,on the basis of verifying the effectiveness of the improved genetic tabu algorithm in the simulation environment,we further verify that the job scheduling algorithm can effectively reduce task execution time and improve task execution efficiency in the cluster environment.Then,the validity of the RDD cache prediction mechanism and the optimized replacement algorithm is verified in the same experimental environment.The experimental results show that this method can effectively reduce the task execution time and improve the memory utilization.

Keywords/Search Tags:

Spark, RDD, genetic and tabu algorithm, weight updating mechanism, cache prediction mechanism

PDF Full Text Request

Related items

1	Optimization Of RDD Cache Mechanism On Spark Framework
2	Research On Cache Mechanism And Job Scheduling Policy In Spark
3	Implementation Of Branch Prediction Mechanism In X-Microprocessor And Research On The Branch Prediction Algorithm Based On Fuzzy Weight
4	Design And Implementation Of Routing Identifier Pool And Mapping Information Updating Mechanism For Identifier-Based Universal Network
5	Research And Implementation Of Self-adaptation Planning Optimization Mechanism Based On Spark Parallel Search
6	Research On Case-Based Reasoning Based On Genetic Algorithm And Tabu Search
7	Real-time Mass Data Processing Analysis And Optimization Based On Spark
8	Research On Proxy Mobile IPv6 Testing Platform And Cache Mechanism
9	Analysis And Research Of Accelerating System Based On Data Stream Cache Mechanism
10	Solving Flexible Job Scheduling Problem Based On Genetic Algorithm