Font Size: a A A

Research And Implementation Of Optimizing Technique On Spark Executing Plan Based On Reusing Of Calculation

Posted on:2018-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:C Q HuangFull Text:PDF
GTID:2428330569498784Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous reduction of distributed computing cost and the increase of the amount of data,more and more data analysis and data mining applications can be completed in a short time with low cost.As a distributed parallel computing framework based on memory which supports online data analysis,Spark has been widely studied and applied in the field of industry and academia.This paper is mainly about the research of Spark performance optimization in a situation where high-volume applications are continuously executed,and do a study about the optimization techniques of Spark implementation planning based on computation reuse which is based on the potential existing computation reuse between applications when they are widely executed and optimization technique of semantic cache in the traditional database,then make a systematic design and implementation based on Spark open source software.Related research work is as follows.1.Based on the deep analysis and research towards the related work,a Spark executing planning optimization model is proposed based on the idea of semantic caching.1)Using the semantics of data analysis contained in the Spark diagram of DAG,the algorithm of computation reuse identification between Spark application and cache is designed.Based on this algorithm,the potential existing same computing subset between Spark application and cache can be found;2)Based on the idea of semantic cache,through caching parts of intermediate results in the application of Spark,and for subsequent application sharing,make a research about the semantic cache management model,which uses the gain rate of computing cost to optimize the selection,organization and management of cache.2.Design and implement a Spark implementation planning optimization subsystem in Spark Core,and combined with the Spark scheduling subsystem based on reuse,to format a Spark optimization system based on reuse,and through experiments the optimization scheme was proved to improve the performance of Spark system.
Keywords/Search Tags:Semantic Cache, Spark, Calculation Reusing, Scheduling optimization
PDF Full Text Request
Related items