Research And Implementation Of Memory Optimization Based On Parallel Computing Engine Spark

Posted on:2014-05-12

Degree:Master

Type:Thesis

Country:China

Candidate:L Feng

Full Text:PDF

GTID:2268330422960509

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Use memory to do parallel computing is an hot topic in this area. Compared withtraditional methods that transfer through disk and network, using memory can decreasedatatransfertimealot,espeicallyfordataintensivejob，canbetentimesfaster. Whilethenew frame is developing, how to the relative limited resource effectively, and guaranteejob’s running time, is becoming an urgent problem to resolve.the work of this dissertation is based on parrellel computing engine Spark, does re-search on the cache usage behaviour of parrallel cluster. Through analying and modelingto the memory usage behaviour, the cacahe use was automated and optimized. Increasethe running efficiency under limited resource environment, and the stability at differentcluster configuration. The main contributions are:Implemented cache strategy automation by analyzing semantic of the srouce code.That is the scheduler can distinguish valued RDDs and put them in cache, avoidcache waste and leverage the burden of programmer.Optimized the cache replacement strategy, by analyzing code’s semantic and getdetail information of the job. Include computing the size and weight of RDD, op-timize the order of actions, form new replacement algorithm by combining registerallcation algorithm and weight information, and multiple level cache mode. Thenalso initial analyzed the cache behaviour on heterogeneous cluster.At last, through various experiments, proved that the job running efficiency can beincreased. and the suitablity under different cluster enviroment can be increasedtoo. The work is also valuable for other parallel system’s cache usage.

Keywords/Search Tags:

parallel computing, cache, Spark, RDD

PDF Full Text Request

Related items

1	Research On Parallel Computing Based On Spark
2	Research On Memory Data Management Technology In Spark
3	Study Of MPI/GPU Parallel Computing Processing Mechanism On Spark
4	Research And Implementation Of Cache And Fault-Tolerance Optimization Strategy Based On Spark
5	Study On Dynamic Parallel Computing Based On Probabilistic Rough Set And Its Application In Spark Platform
6	Research On Spark Performance Optimization Technology For In-Memory Computing
7	Research On Memory Management And Cache Replacement Policies In Spark
8	Research And Implementation On Caching Strategy In Spark
9	The Research And Implementation Of Parallel Algorithm For Bayesian Text Classification Based Spark Computing Environment
10	Parallel Research Of GSP Algorithm Based On Spark