| With the rapid development of Internet technology,the explosive growth of data brings great pressure to the file system.The emerging non-volatile memory(NVM)has the advantages of byte addressing,low latency,close to memory read and write speed,data persistence,which brings hope to break the performance bottleneck of the file system.However,existing file systems face problems such as poor scalability and low concurrency when dealing with massive data,making it difficult to fully exploit the advantages of NVM devices.In addition,although extendible hashing has the advantages of fast read and write operations and dynamic scalability,its hash directory and hash bucket also suffer from low concurrency,and combining it directly with a file system may not effectively improve concurrency.Therefore,this thesis optimizes the extendible hashing and file system successively to improve the concurrency of the file system,and studies and designs the NVM file system based on extendible hashing.Firstly,the challenges facing current extensible hashing and file systems are analyzed.The extensible hash is improved and the metadata management algorithm of file system is optimized based on it.Based on these improvements,the structure of NVM file system based on extendible hashing can be designed,which includes a high-concurrency extendible hashing module and a high-concurrency metadata management algorithm module integrated with the driver.By leveraging the fast read and write speeds of NVM storage devices,the concurrency of the file system can be improved to meet the requirements of high-speed concurrent reading and writing of massive data for applications such as the Internet.Firstly,the challenges of the current extensible Hash and file system are analyzed.Secondly,aiming at the problems of poor concurrency of hash directory and low efficiency of hash bucket management in the current extendible hashing,a high-concurrency extendible hashing module designed for NVM is studied and presented.A hash directory based on lazy expansion is designed to reduce the locking granularity when expanding the directory to a single hash directory item.This improves the concurrency of extendible hashing.The expansion rate calculation algorithm is designed to dynamically determine the size of the secondary expansion hash directory,which reduces the frequency of hash directory expansion and avoids unnecessary hash directory expansion.Additionally,a group-based hash bucket management algorithm is proposed,which decomposes the single hash bucket mounted under the current extendible hash directory into multiple buckets and adds a bucket directory.By changing the way hash keys are managed in the hash bucket,the locking granularity when inserting hash keys and chance of hash buckets splitting are reduced,which further improves the concurrency of extendible hashing.A directory recovery strategy based on hierarchical storage is also designed,which stores hash directories and hash buckets in DRAM and NVM to effectively utilize the respective advantages of DRAM and NVM storage devices,and designs a directory recovery strategy to ensure the reliability of NEHASH.The high-concurrency extendible hashing prototype NEHASH is implemented based on an open-source NVM storage device driver.The YCSB testing tool is used to conduct testing and analysis.The results show that NEHASH outperforms existing hash schemes,such as CCEH,LEVEL,and CUCKOO,in terms of concurrency.Specifically,NEHASH can maximize the read throughput by 16.5% and the write throughput by 19.3% in a multi-threaded environment.Thirdly,aiming at the problems of long I/O software stack of file system and low efficiency and poor concurrency of metadata management algorithm,and based on the high concurrency and scalable hash for NVM,the high concurrency metadata management algorithm driven by integration is studied and designed.In order to shorten the I/O software stack of the file system,the metadata management function of the file system is embedded in the NVM device driver,and the structure of the high concurrency metadata management algorithm driven by integration is given.A metadata Key-Value pair generation strategy based on Fs_simhash is designed to convert the access paths of files and directories into hash values,and a Key-Value pair is constructed to store the hash values and the storage addresses of the corresponding metadata.The Fs_simhash function is designed based on the local sensitive hash function,so that the hash values of adjacent access paths are similar,which provides support for improving metadata storage and search performance.A metadata management policy based on NEHASH is designed.According to different access features of files and directories,the data buckets of NEHASH are decomposed and paths of files and directories are converted using different hash functions.This improves the query efficiency of a single metadata and ensures the efficiency of locality operations such as range query and traversal.Based on Intel’s open source NVM storage device driver,HANVFS,a prototype of high-concurrency metadata management algorithm integrated into the driver,is realized,and the read and write throughput rate and read and write bandwidth of HANVFS are tested by Filebench,Fio and Iozone,and the experimental results show that HANVFS can increase the read and write throughput rate by 30.6% and the read and write bandwidth by 24.7% compared with the NOVA file system loaded on PMEM. |