Font Size: a A A

The Research And Optimization Of Metadata Management Of Distributed File System

Posted on:2011-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LuanFull Text:PDF
GTID:2178360308463593Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
While the performance and capacity of traditional storage devices has a great development in the past few decades, it can not catch up in terms of the growth of performance of network and processors, the capacity and availability of traditional equipment is still difficult to meet the application requirements. Distributed File System provides a unified interface for high-capacity storage, how to improve system performance, capacity and scalability becomes the hotspot. This paper makes a research of the mainstream distributed file system, such as GPFS, Lustre and GFS, analyzes their strengths and weaknesses of system architecture. Because the data stored in various distributed nodes in today's distributed file system, all data access needs metadata, the file system metadata management becomes a key technology. After analyzing various metadata management architectures, summed up their respective strengths and weaknesses, this paper focuses metadata management optimization.HDFS file system stores all the metadata information in the metadata server's memory, with the number of documents with the systems and capacity increases, the metadata server becomes performance bottleneck of the system. In addition, the memory size of metadata server also limits the amount of metadata system can handle, this affects the system's scalability. To make the file system store more and larger files, this paper proposes and implements a two-tier metadata management system, which mainly composed by the master metadata server, secondary metadata server and DB server. The primary metadata server interacts with the client, data storage server, and synchronizies metadata to the DB Server. DB Server saves matadata information for persistent storage and makes response to the primary metadata server request. When primary metadata server fails, requests from client can be transfered to the the secondary metadata server immediately.Metadata processing is an important part of the file system, metadata processing performance affects the whole system performance. Metadata cache can greatly reduce the metadata server and DB Server interaction, and reduce system response time and increase system performance, so the metadata cache is an integral part of the entire system. How to manage the coordination of multi-level, multiple locations metadata cache is another key technology. Finally, this paper make a packet of tests, including benchmark test, MapReduce application test, scalability test, to the HDFS file system. Test result data shows that the design and implementation of metadata management system can make a great improvement the whole system, which make the file system deal with bigger files, store more files, and enhance system fault tolerance, with a limited and acceptable performance loss.
Keywords/Search Tags:Distributed File System, Matadata management, Cache System, Performance Test
PDF Full Text Request
Related items