Font Size: a A A

Metadata Management Optimization In Distributed File Systems

Posted on:2020-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X ChenFull Text:PDF
GTID:1368330578981667Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology in the information age and the arrival of the data age,the amount of data has grown exponentially,and the impact of data storage technology on many application services has deepened.Distributed file systems play a vital role in storage systems because of their high reliability and scala-bility,support for file-sharing storage,and sophisticated concurrent access control.A distributed file system usually consists of three parts:metadata servers,data servers,and clients.Metadata is used to describe file system and file characteristics,such as file type,file size,access permission and data index.The user needs to access the file metadata before accessing the file data to obtain the basic attribute information of the file and the index information of the data.In distributed file systems,more than 50%of file operations are metadata operations,so the access performance of metadata in distributed file systems is critical.This dissertation analyzes and optimizes the existing distributed file system metadata access process and management scheme from three as-pects:distributed file system metadata prefetching mechanism,metadata server cluster load balancing strategy and metadata management scheme to improve metadata perfor-mance in distributed file systems.The main research contents and contributions of this dissertation are as follows.(1)Data Correlations-Directed Metadata Prefetching MethodIn many application scenarios,the locality characteristics of the workload cause multiple files to be accessed simultaneously.That is,there are access correlations be-tween files.If the distributed file system senses the correlations between files in ad-vance,the metadata of the correlated files may be prefetched from the metadata server to the client in advance through prefetching technology.Therefore,by introducing meta-data prefetching technology,the number of metadata I/Os in the system can be reduced while reducing the load pressure of the metadata server and shortening the process-ing flow of the metadata request.However,the existing metadata prefetching strategy mainly uses the offline method to explore a set of frequently accessed files from the file history access record,which is very restrictive and it is difficult to dynamically ad-just the association relationship according to the system load characteristics.In order to solve the problems existing in the existing prefetching technology,this dissertation considers the correlations between files from a new perspective and proposes a data correlations-based metadata prefetching mechanism,SMeta.SMeta explores the corre-lations existing in file data through a lightweight pattern matching algorithm and reuses the metadata extended attribute space to store data correlations to avoid introducing ad-ditional metadata synchronization operations and modifying system APIs.In addition,SMeta introduces an efficient client adaptive feedback mechanism to improve the accu-racy of prefetching.This dissertation implements a prototype system based on Ceph and performs performance evaluation using metadata-intensive benchmarks and real-world workloads.The experiment results show that compared with Ceph,SMeta can reduce the number of metadata requests in the system by 58.5-87.8%,and achieve 10.5 times of metadata access throughput and 2.75 times of client linear scalability.Compared with the access correlations-based prefetching scheme,SMeta can further improve the metadata access performance.(2)Load Balancing in Metadata Server ClusterA load balancing mechanism needs to be introduced in the metadata server clus-ter to ensure the balance of the cluster load distribution,improve the overall resource utilization of the cluster and the concurrent performance of the metadata service simul-taneously.However,the existing metadata server cluster load balancing strategy only considers the load balancing of the logical layer of the metadata server daemons,and it is difficult to dynamically adjust the equalization scheme according to the metadata server cluster architecture.In addition,the balancing decision scheme based on the temporal locality of system workload is too single,and it is difficult to dynamically adjust the bal-ancing decision scheme according to the system workload characteristics.The block-ing metadata migration with a two-phase commit protocol consistency scheme makes the too many migration messages in the migration process and the locks the migration directory severely blocking the client metadata requests and affecting the system meta-data performance.In order to understand the problems in the existing load balancing strategy,this dissertation proposes a new load balancing strategy based on the two-tier architecture of the metadata server cluster and implements a prototype system,Fim.Fim further reduces metadata migration time by introducing intra-node IPC communi-cation schemes to accelerate intra-node messaging and in conjunction with intra-node priority migration scheduling schemes.And Fim takes into account the system work-load characteristics when selecting the current migration directory to further improve the efficiency of load migration.Fim further reduces the impact of metadata migration operations on client metadata requests by concurrently processing metadata migration messages with client metadata requests and introducing non-blocking the metadata mi-gration method.Experiment results show that Fim can effectively shorten the metadata migration time and improve the accuracy of metadata migration.Compared with Ceph,Fim can reduce the preprocessing latency of ImageNet datasets by 50%.(3)Hybrid Metadata Management in Distributed File SystemsThe metadata management includes establishing a mapping relationship between the file system namespace and the metadata server cluster,and is also responsible for regulating load balancing of the metadata server cluster.The existing metadata man-agement schemes are divided into two categories,namely,subtree partitioning and hash-based mapping metadata management schemes.The subtree partitioning meta-data management splits the file system directory tree into multiple directory subtrees and distributes them to the metadata server cluster.The hash-based mapping scheme distributes metadata based on the hash result of the file's unique identifier.However,the two traditional metadata management methods usually based on subtree partition-ing and hash-based mapping are difficult to effectively balance in directory locality and load balancing features simultaneously.This dissertation proposes a hybrid metadata management scheme and implements a prototype system,SmartM2.SmartM2 preserves the good directory locality of the file system by subtree partitioning among the meta-data server nodes,and uniformly distributes the subtree metadata by using a hash-based mapping method among multiple metadata server daemons inside the node to achieve load balancing between multiple metadata server daemons within the node.At the same time,SmartM2 introduces the intra-node IPC communication scheme to accelerate the metadata transmission speed between multiple metadata server daemons in the node to further compensate for the loss of directory locality caused by the hash-based mapping method.In addition,when the size of the metadata server cluster changes,SmartM2 limits the impact of hash-based mapping to a single metadata server node,reducing the total amount of metadata migration caused by remapping and further shortening the la-tency of metadata migration within the node.Experiment results show that SmartM2 can effectively balance directory locality and load balancing features.Compared with Ceph,SmartM2 can achieve 3.9×metadata access throughput And when scaling the metadata server cluster,SmartM2 can reduce the latency of metadata migration by 74.7-92.6%compared to the hash-based mapping management method.
Keywords/Search Tags:Distributed File System, Metadata Management, Metadata Prefetching, Data Correlations, Load Balancing, Metadata Migration
PDF Full Text Request
Related items