Font Size: a A A

Metadata Management For Parallel File Systems

Posted on:2018-01-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:L X WangFull Text:PDF
GTID:1368330569498446Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapidly growth of human society improves the ability of information acquisition,and a series of applications have emerged,generating large amounts and various kinds of data.The large-scale computing cluster faces the challenge of handling massive unstructured data.As a typical usage of large-scale computing cluster,metadata-intensive workloads are not well served by current parallel file systems at very large scale.As a result,the research on a high-performance,reliability and scalable metadata management is an urgent and key issue of parallel file systems.This thesis,based on the current architecture of parallel file system,focuses on the research of metadata management for handling metadata-intensive workloads.We present a hybrid file system architecture,in which a series of optimizations on metadata distribution,path resolution,metadata index,small file I/O and scalable directory service are also proposed.The main work and contributions of this thesis are as follows:1.Propose the architecture of hybrid parallel file system for metadata-intensive workloads.(Chapter 2)Current parallel file systems are not good at handling metadata-intensive workload,due to their matadata management methods.This thesis presents an architecture of our hybrid parallel file system,Moon FS.Moon FS is developed to provide a global and consistent file system view and effective metadata performance.Metadata and small files requests are handled by the metadata management model of MoonFS,while large files are mapped to regular files of the underlying shared parallel file system.Small,random and slow updates are packed into large on-disk files,facilitating sequential allocation and large transfer.2.Propose a metadata management method base on client stateless caching and server directory replication.(Chapter 3)In order to improve the metadata performance in the environment of multiple metadata servers,this paper presents a novel metadata management method base on client stateless caching and server directory replication.It uses a dynamic namespace partitioning mechanism that is based on consistent hashing and works at a directory granularity,to distribute both directories and directory entries across several metadata servers,maintaining directory locality and load balance;At the same time,Moon FS maintains a stateless directory cache in each client and adopts a replicated directory mechanism in each metadata server,to reduce the overhead of path resolution and permission checking.The experiment results indicates that our metadata management method can solve the RPC amplification problem in path resolution and permission checking,optimizing the metadata performance of Moon FS.3.Propose a metadata and small file index method based on LSM-tree.(Chapter 4)Current metadata management methods always adopts B-tree or Copy-on-Write tree to index metadata,these read-optimized data structures suffer from workloads dominated by workloads of massive simultaneous metadata requests.The thesis presents an LSM-tree-based index method to partition the namespace into several columns on a per-directory basis.This paper presents a detailed introduction of mapping between metadata operations and LSM-tree operations,namespace partition,metadata mapping,metadata storage,and metadata operation optimization.A write-ordered log-structured layout is also used to store small files efficiently,reducing the overhead of compactions of LSM-tree.The experiment results shows that our optimizations can index metadata and small file efficiently than other index structure.4.Propose a scalable directory service based on optimistic lock mechanism.(Chapter5)Current metadata management method suffers from workloads of handling ultralarge directories.This thesis presents a distributed and scalable directory service,base on optimistic lock mechanism.With concurrent and unsynchronized partition splitting,our method can distribute the large directory into several metadata servers,maintaining load balance between metadata servers.At the same time,clients can gracefully tolerate stale mapping at the clients while maintaining the correctness and consistency of the system.The experiments results shows our scalable directory service can improve the metadata performance when handling large directories.
Keywords/Search Tags:Metadata-intensive workload, Hybrid parallel file system, Metadata Management, Metadata index, Scalable directory service, Log-structured merge tree
PDF Full Text Request
Related items