Font Size: a A A

Research And Optimization Of The Distributed Storage On HDFS

Posted on:2015-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2298330452994326Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the high-speed development of the Internet and the rapid growth of internet’susers,the internet’s data is also quickly increasing.In order to provide better service forusers the internet’s companies should save and analyze the data.For this reason the conceptof cloud computing is generated. The cloud computing is a good solution to the computeand storage of the big data, the cloud storage as a derivatives of cloud computing is alsobecoming a hot topic.The distributed file system HDFS of Hadoop has become the standard of the study oncloud storage because of the advantages of the high performance and high reliability.InHDFS the way of streaming to read and write large files is very efficient,but the efficiencyon reading and writing the mass of small files is relatively low. According to this problemthis paper pesents a strategy that small files are merged which is based on relationaldatabase consolidation. When user uploads small files,this paper creates a user’s file foreach user in clusters,then writes file’s metadata information to relational database and thefile is written in the user’s file. According to the metadata information user reads small filesby the streaming mode.When user reads the file which size is smaller than the fileblock,datanode take load balancing strategy that the datanode of storing data transfers datadirectly.This method can reduce the pressure of the main server and impove the efficiencyof file’s transfer.In addition to optimize the architecture of the HDFS,this paper also puts thetechnology of web and the distributed storage together to build a cloud storage platform.Inorder to get the user’s behavior and the status of server the system uses Hive to analyze andmine the log of website and the cluster. The experimental’s result shows that thisarchitecture solves the HDFS shortcoming of reading and writing small files and improvesthe performance of reading and writing massive small files.This scheme can apply to thecloud storage system which has massive small files.
Keywords/Search Tags:HDFS, improve small files, merge files, load balance, log analysis, cloud storage
PDF Full Text Request
Related items