Font Size: a A A

Client-oriented Highly Available And Scalable Metadata Service

Posted on:2017-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:L X AoFull Text:PDF
GTID:2308330485984558Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Rapidly evolving big data analytic technologies call for innovations in storage techniques, especially in metadata processing. Large-scale distributed file systems rely on highly scalable and highly available metadata management, which current research fails to provide. In this paper, two pieces of work are presented for solving availability and scalability problems, respectively.First, metadata availability problem. Highly available metadata management of distributed file systems are essential to the applications. However, many existing highly available metadata mechanisms ignore client-oriented considerations, which treat different metadata discriminately, resulting in a single fault domain and inefficient resource usage. After investigating some workload characteristics of applications in Hadoop, this paper proposes Client-Oriented METadata(Comet), a novel highly available metadata design that treats metadata working-set of different clients independently and distributes them into regions. These regions are isolated from each other and they form their own fault domains. The failures in one region have no influence on other regions. A prototype of Comet was implemented on HDFS, and the experimental results show that Comet obviously improves overall HDFS metadata availability with acceptable performance degradation. It can also provides increased performance and more efficient metadata recovery as the system scales out.Second, metadata scalability problem. The unscalable architecture of HDFS metadata can lead the metadata service to be a capacity and performance bottleneck. Previous efforts on improving scalability either eliminate the metadata locality characteristic or perform poorly when directory renaming. This paper designs and implements Partitioner,a distributed scalable metadata service based on HDFS. To support the distributed metadata management, a novel dynamic metadata subtree partitioning method and a radix-tree based partition lookup scheme are proposed. Load balancing and directory operations are also optimized to provide more efficient resource usage and reduce migration costs, respectively. Experiment results indicate that this method can expand metadata capacity and throughput, resulting in improved scalability of the metadata service.On the basis of the work above, this paper realizes improvements in metadata availability and scalability, which meet the metadata requirements of distributed file systems for next-generation big data applications. The novel client-oriented metadata design can be integrated with other optimizations in metadata, and the scalable metadata design paves the way for metadata management of future exascale storage systems.
Keywords/Search Tags:distributed file systems, HDFS, metadata management, availability, scalability
PDF Full Text Request
Related items