Font Size: a A A

Research On Cache Policy And Performance Optimization For Tiered Big Data Storage System

Posted on:2018-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:P ShuFull Text:PDF
GTID:2428330512498264Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the big data era,the explosive growth of data scale and the large variety of requirements for big data applications have driven the rapid development of big data technology.In the computing level,the emergence of big data streaming processing,SQL querying and graph computing technology has effectively improved the performance of big data analysis and processing in the recent years.In the storage level,eonsidering the low performance of disk-based big data storage system,some reasearchers embark on developing memory-centric distributed storage system in order to accelerate the I/O rate of upper big data applications.Alluxio is one typical of them.Alluxio designed a tiered storage system based on MEM-SSD-HDD architecture in order to expand the overall storage space.It is significant to research on efficient cache policies and scheduling algorithms so as to improve the performance of some complicated big data applications running on top of the tiered storage system.Learn from the experience of tiered storage system in Alluxio,we propose a general cache scheduling framework which is convenient for users to customize more efficient cache policies.The cache scheduling framework proposed by us applies to any tiered storage system composed of MEM,SSD,HDD and other storage devices.On this basis,through the analysis of cache policies in big data storage system,we propose a list of cache eviction policies covering different access patterns to meet the demands of different big data applications.Besides,through the analysis of data access patterns in big data applications,we propose two efficient scheduling algorithms to select eviction policies adaptively in different application scenarios,so as to further improve the performance of upper running big data applications.Following are the primary contributions of this paper.(1)We have proposed an extendable cache scheduling framework on tiered big data storage system.It is convenient for users to customize more efficient cache policies on top of the framework in order to improve the performance of more complicated big data applications.(2)Based on the cache sheduling framework,we have implemented a list of cache eviction policies covering different access patterns,including LRU,LRFU,LIRS,and ARC.We evaluate the performance of these cache eviction policies with some widely used big data applications,and discover that each cache eviction policy can be used to accelerate big data applications in specific scenarios.(3)Based on the cache scheduling framework,we have also implemented an efficient cache promotion policy which can promote hot block from low-performance storage devices to memory automatically.The experimental results show that the cache promotion policy proposed by us can effectively accelerate the big data applications running on top of the tiered storage system.(4)Considering that single cache eviction policy cannot adapt to all application scenarios,we have proposed two scheduling algorithms based on hit rate and data access patterns respectively.The algorithm based on hit rate can be used to select cache eviction policy adaptively for real-time big data applications which should be run only once;while the algorithm based on data access patterns can be used to select cache eviction policy adaptively for routine big data applications which should be run repeatedly.The experimental results show that the scheduling algorithms proposed by us have great advantage over cache eviction policies running separately.(5)Based on above general framework and cache policies,we have implemented our prototype in Alluxio,including the general cache scheduling framework,cache policies and scheduling algorithms.The experimental results show that the prototype implemented by us has significantly improved the performance of Alluxio and has greatly accelerated the big data applications running on top of Alluxio.By the way,the cache eviction policies designed by us have been contributed to Alluxio open source project and are already being in use by some companies.
Keywords/Search Tags:Big Data, Tiered Big Data Storage System, Cache Scheduling Framework, Cache Eviction Policy, Cache Promotion Policy, Alluxio
PDF Full Text Request
Related items