Font Size: a A A

Research On Efficient Cache Strategy Based On Data Access Characteristics

Posted on:2022-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:C C XuFull Text:PDF
GTID:2518306536963679Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since solid state disk(SSD)has better read and write performance than hard disk drive(HDD),it is usually used as a disk cache in various computing systems.However,as the data intensity of modern workloads continues to increase,the demand for cache capacity has risen sharply,which results in the limited solid-state disk capacity being insufficient to store hot data,thereby limiting the use of SSD.Data deduplication can detect and delete the same data.It can effectively avoid the writing and storage of redundant cache data,thereby improving the space utilization and durability of the SSD.Therefor,building a deduplication-aware cache is a feasible and effective solution.However,through the analysis of the existing deduplication-aware cache,it is found that it combines reference count to help cache replacement data,but does not effectively use data access associations,which seriously hinders the improvement of cache performance.Moreover,ignore the effective content distribution of each visit,which makes it difficult for the cache to accurately capture the characteristics of the workload.In response to the above questions,this thesis has done the following work:First of all,this thesis introduces and analyzes the existing deduplication-aware cache in detail,and elaborates on two main issues: fail to effectively use the relevance of data access,and ignore the diversity of data access distribution.Secondly,for the problem of ineffective use of data access relevance,we propose a cache replacement strategy based on relevance strength,named RAR.RAR redesigns the replacement rule of cached data according to the change of access frequency and reference count.In addition,RAR is also combined with the segmented cache structure,which can better control the retention time of data blocks with higher correlation strength in the cache.Thirdly,for the problem of ignoring the diversity of data access distribution,we propose a fine-grained management strategy based on access distribution,named MAD.MAD comprehensively considers the characteristics of each data access request,and fully extracts the effective data content and access behavior to manage the cached data,thereby eliminating the misjudgment of the popularity of the data in the cache.Finally,we implemented the above two strategies based on the traditional caching algorithms LRU and ARC,respectively,called RM-LRU and RM-ARC.We use multiple workloads to evaluate the above strategies.Our extensive results show that the performance of RM-LRU and RM-ARC is better than the existing deduplication-aware cache algorithms,and the cache miss ratio is reduced by 23.91% and 29.98%,respectively.In addition,with the above two strategies,the IOPS and deduplication ratio of cache have also been increased by 50.27% and 2.67% respectively.
Keywords/Search Tags:Storage cache, Data deduplication, Access Characteristics, Cache strategy
PDF Full Text Request
Related items