Font Size: a A A

The Research On Massive Small Files Processing Under The Hadoop

Posted on:2016-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2308330473964468Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the big data are applied to social life, resulting in either in the amount of data or in the data type has been an explosive growth, so that in a storage capacity, higher need for data analysis therefore, as data in the cloud computing concept extension and development out of the processing technology, has become a hot research direction. While the Hadoop platform as the most widely used cloud data processing platform, which in the face of massive small file processing, due to factors inherent in the design, the direct application of existing when handling the problem of low efficiency.Aiming at the low efficiency of the management of massive small file under the platform of Hadoop, this thesis combined with the actual project deployment optimization, mainly carries on the analysis from three aspects below: firstly, the underlying architecture of cluster, through the analysis and comparison of the differences between traditional virtualization technology and container technology in the processing performance, using the better underlying architecture of the scheme Hadoop platform, virtual machine architecture scheme that does not use traditional, while using the container as the environment to deploy the bottom to build the Hadoop platform, so as to improve the application of Hadoop platform data processing ability; in the file storage, the existing HDFS based on Hadoop platform, using the basic idea of the summary and classification, combining with the actual situation to find the right method to improve the Hadoop platform for the storage efficiency of large amount of small files; in the data processing level, combined with the new computing framework in the Hadoop eco-system, that is used to improve the efficiency of processing mass data of small files based on the memory computational model. In this thesis, through the construction of the test platform, combined with experimental data to demonstrate that the optimum processing scheme in Hadoop platform for massive small file.
Keywords/Search Tags:Hadoop, Small File, Virtualization, Container, Cloud computing
PDF Full Text Request
Related items