Font Size: a A A

Cache Data Management System For Distributed In-Memory Computing

Posted on:2017-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z GengFull Text:PDF
GTID:2348330503989857Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Utilizing complex application with iterative characteristic such as graph processing or machine learning to process big data problem is much more common now. Distributed in-memory computing system such as Spark improves the speed of iteration application obviously through sharing cache data between iterations, so such systems are widely used in the industry. However, cache data cannot be all saved in memory when dealing with a large amount of data, so a chche data management system is needed. Evicted data will be used again in the latter iterations and need to be recovery, which introduces recovery overhead, while the traditional strategies such as LRU or FIFO cannot guarantee minimum overhead.Experiments show that in distributed in-memory computing system, recovery cost of data exists obvious variation. In order to reduce the influence of eviction, a cache management system with recovery overhead considered is proposed and implemented in Spark. First, because the logic of application to be executed in distributed system is known, management system could find dependencies between cache data through the analysis of the execution logic; then define the recovery cost of data cache to characterize eviction overhead, and reusability to characterize the reuse number in the latter part of the processing; finally design the eviction strategy to maintain eviction order of cache data to minimal overhead if eviction occurs.In the same experimental environment, compare this cache management system considering recovery cost with the default management strategy in Spark, experimental results show that when the memory space is insufficient, this system can reduce the overall running time of the application by 30% to 50%.
Keywords/Search Tags:In-Memory Computing, Distributed Processing, Cache Management, Eviction Strategy
PDF Full Text Request
Related items