Font Size: a A A

Research And Implementation Of Distributed Meta-data Management Framework For HDFS

Posted on:2012-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:S N HanFull Text:PDF
GTID:2298330467976387Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently, increasing attention is attracted on cloud computing no matter in home or abroad. At the same time, hadoop is considered as the most important and most widely used open source platform for cloud computing. As the preferred storage system, HDFS (Hadoop Distributed File System) which can reliably store large data greatly promotes the development of hadoop. Although single master design makes hadoop be more easily established, it impedes hadoop to achieve high availability as well as limits hadoop’s scalability. For hadoop "single node" problem, both community and enterprise have raised solutions; however there has been no widely accepted solution so far.In this thesis, we propose a distributed meta-data management framework for HDFS based on the analysis of mainstream solutions’advantage and disadvantage. This framework could not only solve "single master" problem of HDFS, but also improve system’s concurrency and throughput. In order to adapt current cloud computing environment which requires large-scale data processing and mass nodes management, this framework uses hierarchical thinking instead of the original design of HDFS that namespace and meta-data blocks are managed together to lift system’s flexibility and scalability. Then, we give adjustment strategy and distribution algorithms of those two important meta-data. After that, we design corresponding management mechanisms for the proposed framework including nodes register/deregister, replication management and recovery as well as meta-data synchronization and migration. In addition, this framework makes some improvements for some lack of HDFS to ensure correctness, efficiency and availability of system. Finally, based on the analysis of the key code in HDFS, distributed meta-data management framework try to maximum use existing code and mechanism of HDFS. At the same time, reconstruct some code to improve readability and reduce the coupling of proposed framework.At the end of this thesis, we test the influence of distributed meta-data management to HDFS read and write performance, startup performance, concurrency, availability and scalability by designing some experiments. The result shows that although this framework performance in reading and writing are minor inferior to HDFS, it is better in concurrency, availability and scalability. Therefore it should be better to meet the Hadoop application environment.
Keywords/Search Tags:HDFS, cloud computing, Meta-data management
PDF Full Text Request
Related items