Font Size: a A A

Research On Memory Optimization Algorithm Based On Weight Priority Task Scheduling Strategy In Spark Platform

Posted on:2019-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChenFull Text:PDF
GTID:2428330590965714Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of rapid development of big data,the traditional data storage and computing power have been unable to meet the popular need of the public.So Spark has now become a typical representative of the current distributed computing framework.However,with the development of Spark and the rapid expansion of cluster scale,how to make rational use of cluster resources has become a hot topic of current research.Shuffle is an important stage between Map and Reduce.Therefore,the performance of the Shuffle phase will directly affect the operation efficiency of the whole system.This paper studies and improves the Spark platform from two aspects of Task scheduling and memory allocation in the Spark Shuffle process.The research work in this paper is divided into two parts.We proposed the Task scheduling policy based on the weight priority.And we proposed the secondary allocation algorithm that is using the free memory.The detail work is as fallows:1.Focus on present problems that default scheduling policy of Task causes load imbalances on Worker nodes.In this paper we propose a Task scheduling policy based on the weight priority.First,we classify the Tasks according to resource requirements and reading speed.Then we monitor each of the Workers in real time.And By using the CPU utilization,the memory utilization and the load length of Tasks on a single Worker node as index,we calculate the weight of computing power of each Worker node.Finally,Then the Tasks and Worker nodes are mapped to schedule.The experimental results show our algorithm that can improve the system performance,and the running time is saved by 7.21% compared with the existing improved algorithm.2.Focus on present problems that the default algorithm causes the utilization of cluster resources reducing in Spark Shuffle.So we proposed an optimized algorithm that is secondary allocation using free memory.First,we classify the Tasks according to resource requirements.Then we choose the large Tasks and add a block to prevent memory overflow for them when allocate memory for the first time.Finally,we use the free memory in Task running process and the memory space that is not used to allocate firstly for the Tasks with spillovers.The experimental results show the algorithm that we proposed can reduce the memory overflow and memory waste better.In the case of nonuniform data,the running time is saved by 6.6% compared with the existing improved algorithm.The memory spillover is 10.8% lower than the existing improved algorithm.
Keywords/Search Tags:Spark, memory optimization, weight priority, scheduling policy, secondary allocation
PDF Full Text Request
Related items