Font Size: a A A

Research And Implementation Of Memory Optimization Based On Parallel Computing Engine Spark

Posted on:2014-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:L FengFull Text:PDF
GTID:2268330422960509Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Use memory to do parallel computing is an hot topic in this area. Compared withtraditional methods that transfer through disk and network, using memory can decreasedatatransfertimealot,espeicallyfordataintensivejob,canbetentimesfaster. Whilethenew frame is developing, how to the relative limited resource effectively, and guaranteejob’s running time, is becoming an urgent problem to resolve.the work of this dissertation is based on parrellel computing engine Spark, does re-search on the cache usage behaviour of parrallel cluster. Through analying and modelingto the memory usage behaviour, the cacahe use was automated and optimized. Increasethe running efficiency under limited resource environment, and the stability at differentcluster configuration. The main contributions are:Implemented cache strategy automation by analyzing semantic of the srouce code.That is the scheduler can distinguish valued RDDs and put them in cache, avoidcache waste and leverage the burden of programmer.Optimized the cache replacement strategy, by analyzing code’s semantic and getdetail information of the job. Include computing the size and weight of RDD, op-timize the order of actions, form new replacement algorithm by combining registerallcation algorithm and weight information, and multiple level cache mode. Thenalso initial analyzed the cache behaviour on heterogeneous cluster.At last, through various experiments, proved that the job running efficiency can beincreased. and the suitablity under different cluster enviroment can be increasedtoo. The work is also valuable for other parallel system’s cache usage.
Keywords/Search Tags:parallel computing, cache, Spark, RDD
PDF Full Text Request
Related items