Font Size: a A A

Research On Cache Techniques For Distributed File System

Posted on:2020-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:C J LiFull Text:PDF
GTID:2428330575458240Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,the scale of data stored and processed by computers has exploded.Meanwhile,the big data distributed storage and parallel computing technology developed rapidly.A distributed memory-based file system represented by the Alluxio can bring significant performance improvement to traditional distributed storage.In order to improve the efficiency of data access,distributed storage systems usually adopt a caching mechanism.However,the caching mechanism of the existing distributed memory file system is also difficult to handle frequent small-scale data reading and multi-tenant shared cache space.First,in the scenario of frequent random access to large files and repeated access to large numbers of small files,the existing caching technology still relies on server-side caching,but does not take full advantage of client-side caching.Second,in the scenario where the server multi-tenant shares the cache space,the existing cache-sharing algorithm is difficult to balance fairness and efficiency performance effectively.Therefore,the existing hierarchical distributed file system caching technology is difficult to meet the needs of many aspects of small-scale data efficient caching and multi-tenant shared cache space scenarios.Aiming at the above problems,this paper proposes a fine-grained client-side caching model based on submodule optimization algorithm,two new multi-tenant cache-sharing strategies,and a complete caching framework.The main research work and contribution points of the thesis include:(1)In terms of client-side caching,this paper designs a new fine-grained caching model for small-scale data cache inefficiency,which can manage variable-length cache blocks containing partially overlapping segments.In this cache model,this paper s the caching problem into a submodule optimization problem.When dealing with partially overlapping file fragment sets,the submodule optimization algorithm is used to identify the hot data and provide a synchronous/asynchronous cache replacement/upgrading strategy.(2)In terms of server-side caching,we propose two multi-tenant cache-sharing algorithms:Efficient Sharing based on Fairness(ESF)algorithm and Proportion Fairness(PF)algorithm.Among them,the ESF algorithm comprehensively considers the hit rate attenuation,resource usage rate,and shared file accessing;the PF algorithm satisfies the no-blame attribute,making sure the sum of the user benefit reduce values is lower than the sum of the benefit rise values in real time.(3)This paper also integrates the above technology design to implement a multi-tenant caching framework,providing multi-cache mechanism expansion,multi-system support,and multi-tenant management.The framework includes an application layer,a cache service layer,a middleware layer,and a storage layer.The application layer provides client-side caching;the cache service layer manages data and metadata,supports pluggable cache migration strategies;the middleware layer contains external cache and dependency components;and the storage layer contains multiple underlying storage systems.Experiments show that in the aspect of client-side caching,the fine-grained caching technology proposed in this paper can increase the random reading speed by about 4 times compared with the server-side caching cache.In terms of server-side caching,compared with the existing cache-sharing algorithm,under the premise of ensuring high fairness,the ESF algorithm and the PF algorithm proposed in this paper global hit ratio effectively,and getting more fair performance in a uneven access scenario.
Keywords/Search Tags:tiered distributed file system, client-side cache optimization, sub-module function, cache sharing
PDF Full Text Request
Related items