Font Size: a A A

Research On Memory Optimization Technology Of Spark Computing Engine

Posted on:2019-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2428330548994986Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet,the rise of the Internet of things and the rapid development of communication technology,the data generated by users are showing an exponential rise.The traditional single server has not been able to deal with super large scale data.Thus,large data distributed systems based on clusters emerge as the times require.Among them,Hadoop and Spark are the most popular distributed computing frameworks.Compared with the Hadoop based on disk and grid computing,the Spark computing speed is 100 times faster than that of Hadoop,which can significantly reduce the data transmission and computing time.With the rapid development of Spark,how to make full use of its memory resources to improve the efficiency of memory usage is a problem worthy of study.The Spark memory management is researched thoroughly in this paper.The principle of Spark framework and source code are deeply researched.Then a performance analysis system is designed to obtain the application execution,so as to judge the advantages and disadvantages of the memory management scheme.According to the two different memory management schemes of Spark1.6 and later versions,The similarities and differences between the new and the old memory management schemes are deeply studied to find the deficiency of the memory allocation ratio in the new memory management scheme.Through typical distributed experiments,the deficiency of Spark new memory management scheme is verified.Then the shortcomings are improved so that the improved memory management scheme can be successfully operated in small memory.When analyzing the principles of the new and old memory management schemes,it is found that the core of the two schemes is realized by using the static configuration algorithm.According to the inherent defects of the static allocation algorithm,this paper proposes an adaptive memory dynamic allocation algorithm(AMDAAH)based on overflow history.The algorithm can dynamically adjust the proportion of each memory type according to the relative size of different memory types needed during the operation,so as to improve the efficiency of memory usage.At the same time,the AMDAAH algorithm is compared with the first come first serve algorithm and the static allocation algorithm.Experiments show that the AMDAAH algorithm can show better memory usage efficiency in the two different types of applications,SparkPi and PageRank,and the overall performance is optimal.
Keywords/Search Tags:Spark framework, Memory management, AMDAAH, Performance analysis, Hadoop
PDF Full Text Request
Related items