Font Size: a A A

Design And Implementation Of The Key Techniques For Storing And Retrieving Massive Small Files In Hadoop

Posted on:2016-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y C JiaFull Text:PDF
GTID:2308330473960969Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The rapid development of Internet has led to the large quantity of data, at the same time traditional data processing techniques are difficult to deal with the massive data processing effectively. In this case, Hadoop being the framework for handling the massive data is used widely, whose distributed file system is running on a large number of common and cheap computers. HDFS can be scaled smoothly and provide the high throughput of data access,so it is suitable for large-scale data processing.HDFS adopts the “master/slave” structure, where the cluster can have only one NameNode and multiple DataNodes. Hadoop is designed to handle big files and NameNode will generate the metadata in memory for each file stored in HDFS and metadata size is not directly related to file size. Hence,if there are massive small files stored in HDFS,the NameNode’s memory can not accommodate such a large number of metadata and its memory size will be the bottleneck for scaling of the system. However, in the Internet era, social network, blogs and shopping sites will produce massive small files and most of uplodad files in the cloud disk application are small files such as documents, pictures, audios. On this basis, the great challenges have been posed to the use of Hadoop.On the account of Hadoop’s inefficiency when storing small files, this thesis combines multiple small files into a big file and establishes the mapping from small files to a big file. At the same time, in order to deal with Hadoop’s inefficiency of retrieving small files, R tree, inverted index and global mapping management technology are utilized to provide different retrieval ways based on file’s name and metadata. Finally,the simulation results show that the proposed methods can significantly improve the Hadoop’s efficiency in handling massive small files.
Keywords/Search Tags:Hadoop, small files, storage, retrieval data
PDF Full Text Request
Related items