Design And Implementation Of The Key Techniques For Storing And Retrieving Massive Small Files In Hadoop

Posted on:2016-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Jia

Full Text:PDF

GTID:2308330473960969

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

The rapid development of Internet has led to the large quantity of data, at the same time traditional data processing techniques are difficult to deal with the massive data processing effectively. In this case, Hadoop being the framework for handling the massive data is used widely, whose distributed file system is running on a large number of common and cheap computers. HDFS can be scaled smoothly and provide the high throughput of data access,so it is suitable for large-scale data processing.HDFS adopts the “master/slave” structure, where the cluster can have only one NameNode and multiple DataNodes. Hadoop is designed to handle big files and NameNode will generate the metadata in memory for each file stored in HDFS and metadata size is not directly related to file size. Hence,if there are massive small files stored in HDFS,the NameNode’s memory can not accommodate such a large number of metadata and its memory size will be the bottleneck for scaling of the system. However, in the Internet era, social network, blogs and shopping sites will produce massive small files and most of uplodad files in the cloud disk application are small files such as documents, pictures, audios. On this basis, the great challenges have been posed to the use of Hadoop.On the account of Hadoop’s inefficiency when storing small files, this thesis combines multiple small files into a big file and establishes the mapping from small files to a big file. At the same time, in order to deal with Hadoop’s inefficiency of retrieving small files, R tree, inverted index and global mapping management technology are utilized to provide different retrieval ways based on file’s name and metadata. Finally,the simulation results show that the proposed methods can significantly improve the Hadoop’s efficiency in handling massive small files.

Keywords/Search Tags:

Hadoop, small files, storage, retrieval data

PDF Full Text Request

Related items

1	Research And Implementation Of Small Files Storage Management Based On Hadoop
2	The Research And Implementation Of Storing Massive Small Files In Cloud Storage
3	Research Of Improving Storage Of Replica And Small Files Merging And Access Optimization On Hadoop Platform
4	Research On Access Optimization Of Small Files In Hadoop Cluster
5	Research And Optimization Of Small Files Processing Techniques In Hadoop
6	Design And Implementation Of Cloud Storage System Based On Hadoop
7	Research And Implementation Of Cloud Storage Platform Based On Hadoop
8	The Research On Storage Of Massive Small Air Cargo Files Based On Hadoop
9	Research On Processing Techniques Of Massive Small Files Based On Hadoop
10	Study On Processing Of Massive Small Files Based On Hadoop