Font Size: a A A

Design And Implementation Of A HDFS-Based File Management System

Posted on:2017-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:M M MengFull Text:PDF
GTID:2308330488973405Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet, all kinds of enterprises need to process large amount of rapidly growing data. Small and medium-sized enterprises usually store data of PB level and TB level. A variety of distributed file systems came into being to meet such requirements.The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. Because HDFS does not provide the client management tool, there is no monitoring module and function of distributed search, thus cannot meet requirements of small and medium-sized enterprises.Based on the deep analysis of HDFS storage technology, read/write mechanisms, this thesis designs and implements an online HDFS-based file management system. The thesis’s main work is as follows:(1) A client-oriented HDFS-based file management system is designed and implemented. Users can easily manage huge amounts of data stored in the HDFS through Web pages.(2) HDFS download optimization scheme and small file storage scheme are designed, improving the storage efficiency and download speeds of the HDFS-based file management system.(3) Elasticsearch distributed search technology is used to create index files in the HDFS-based file management system, realizing the function of distributed index and distributed search through the process of optimizing and improving the index.(4) The manageability of HDFS-based file management system is strengthened by monitoring node information and traffic information of HDFS cluster, with the deployment of Ganglia cluster monitoring tool.Finally, extensive experiments are conducted to test the implement system. The results show that the HDFS-based distributed file management system can efficiently manage very large files, satisfying the requirements of small and medium-sized enterprises.
Keywords/Search Tags:HDFS, Elasticsearch, distributed file system, distributed search
PDF Full Text Request
Related items