Font Size: a A A

The Technical Research Of Optimization Of File Storage In HDFS

Posted on:2014-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhangFull Text:PDF
GTID:2268330401469454Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, in the face of growing huge amounts of data, the computer field proposes a new calculation model--cloud computing. Hadoop is an open source framework which can realize large scale distributed computing, and have high throughput, high reliability, high scalability, therefore, is widely used in the field of cloud computing. Hadoop distributed file system(HDFS) is designed to be suitable for operation on the general hardware, it is also a highly fault-tolerant system, can be deployment on the cheap machine. HDFS provides high throughput of data access, very suitable for application on large data sets, and can be read from the file system in the form of streaming data.But as a growing in the development of the distributed file system, HDFS may inevitably exists some defects with the data storage. For example, when HDFS copy data for storage, selected on the rack randomly may lead to the Datanode load imbalance, thus affecting the performance of the whole system. And HDFS was originally designed to streaming store large files, not optimization for small files storage, so performance is very low when handling with small files. This article first do some brief introduction for the development of the distributed file system, Then further analyzes the Hadoop distributed file system, including its architecture, metadata management, and file reading and writing process, and analyzes the performance and deficiency of the solution of the existing scheme about HDFS data storage and small files storage. In this paper, the main innovation points are as follows:1. Random select Datanode for data storage on the rack, may lead to the Datanode load imbalance and other issues, then proposed a method about use the multi-objective optimization technology, based on the current running condition of the Datanode, search for the optimal synthesis conditions of the Datanode for data storage. This method makes the data store balanced in the Datanode, also can improve the performance of data reading and writing.2. In practical application will produce lots of small files, aiming at the shortcomings of the HDFS to store small files, put forward the strategy of small files merging and Client cache small files. Small files on the Client will be merged into some big files, then stored the big files and some metadata in the HDFS together; when reading a small file, the Client caches the big file contains the small file, when read it again, or read other small files in big files, can be read directly from the Client., reduced the number of frequent requests that Client link to the Namenode to get the metadata, also reduced the number of frequent requests that Client link to the Datanode to get the data, greatly improve the storage efficiency of small files.
Keywords/Search Tags:Hadoop Distributed File System(HDFS), The storage node selection, Smallfiles storage
PDF Full Text Request
Related items