Font Size: a A A

Research And Design On Hybrid Storage Architecture Oriented To Massive Small Files

Posted on:2020-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:H W DuFull Text:PDF
GTID:2428330611998711Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the mobile Internet environment,data access has high concurrency,high randomness,fast hotspot change,and strong access correlation,which leads to serious performance bottlenecks in the data center.Hybrid storage systems can effectively improve the I/O performance of storage systems by deploying a high-hit data cache mechanism.However,the access characteristics of massive small files make the performance of traditional cache architecture,replacement strategy,file association,and heat analysis algorithms appear bottleneck.This paper conducts in-depth research on the existing hybrid storage architecture,and performs statistical analysis on data access features.Starting with data access correlation and SSD hardware features,the research designs efficient and accurate data group prefetching algorithm and hierarchical cache replacement strategy.The research implements the storage architecture based on Openstack swift and validates the proposed architecture and algorithm using the widely used dataset.This paper firstly proposed the definition of cache transaction and the construction method of cache transaction feature to optimize the attribute coverage and computability compared with traditional feature representation method.Secondly,a file group prefetching algorithm for hybrid storage system based on cache transaction is proposed.Through the data pre-blocking algorithm and the high-association-priority data grouping algorithm,the access relationship is mined from different granularities,which improves the efficiency and accuracy of relation mining,and proposes a data merge storage strategy to improve the access efficiency of data group.Thirdly,the project proposes a massive real-time analysis algorithm for small files access heats,which optimizes data access heat indexing efficiency and memory usage.Finally,this paper designs the data layout strategy and replacement strategy for all levels of cache for the access characteristics of massive small files and the hardware characteristics of cache devices at all levels,reducing the frequency of SSD cache replacement for hot files and improving the overall access efficiency,thus reducing SSD hardware loss.The experiment results show the correctness and rationality of the proposed algorithm and strategy,which proves the effectiveness of the proposed hybrid storage architecture.The models and algorithms proposed in this paper optimize the key issues of the hybrid storage architecture,greatly improve the data access efficiency of the hybrid storage system in massive small file environments,and thus improve the concurrency and throughput of the storage system.What's more,this paper provides theoretical and practical references and evidence for subsequent related research.
Keywords/Search Tags:Hybrid Storage System, Hierarchical Cache Architecture, Cache Replacement Strategy, Access Correlation Mining, Access Heat Analysis
PDF Full Text Request
Related items