Font Size: a A A

Research On The Processing Method Of Massive Small Files In Cloud Computing Environment

Posted on:2017-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:T ShaoFull Text:PDF
GTID:2308330491454677Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Hadoop is open source software framework, in recent year it has becomes a representative cloud computing platform, which we can take benefits from its reliability, scalability, distributed computing and storage. Hadoop is mainly composed of HDFS (distributed file system) and MapReduce (parallel programming model).As a main componentof Hadoop, HDFS (Hadoop Distributed File System) is a file system using a master-slave model which calld namenode and datanode, respectively, and it is designed for dealing with stream data, whereas, it suffer from small files especially massive small files because of it structure. The prevailing approch to solve the problem is merging small files, however, there is no clear defination of "how small is small", and it is generally accepted that the files that much smaller than the default size of block in HDFS is regared as small files, but it’s not clear enough.The problem of demarcation point of small files has great influence on the strategy of storing files, the storage of massive small files and so on, and wait to be solved.Aiming at the problem of uncentein definetion of small files in HDFS, We propesed a method called Cut-GAR to make a distinction between "big file" and "small file", We learn the structure of HDFS and the policy it used to store files, and analyse the process of accessing files and the quantization standard it used so as to find out the causes of its bad profermence when facing masive small files.the main contributions lie in the following two points:(1) We give a careful look of what have been done about the problem of "Demarcation point of small file", and we analyse them in three different degree including the nature solution in the architecture of Hadoop, general solutions and specific solutions. By comparison, we catch their dispoints of different methods.(2) In order to solve the problem, a method called Cut-GAR is creatively propoed to confirm the demarcation point of small file in the paper, and it employs the grey gelationar analyse method as the core algorithm while three parameters are set as evaluation criterion Memory consumption by NameNode, speed of uploading files, speed of uploading files. Firstly, we insult a relationship among speeds of uploading or downloading files, the consumed memory of namenode and file size,and the three optimal cut-off points are obtained; secondly, Grey relational analysis is applied to indicate the degree influence of the three factors mentioned above, the weights of evaluated index are calculated as well as the relational degree of index-object; finally, three optimal cut-off points are multiplied by the corresponding index weights, then approximate optimal value of cut-off point comes out. In order to verifiy its stability, data blocks in HDFS are modified to 16MB and 32MB, in the meanwille, Cut-GAR is compared with original method to verifiy its effectiveness.
Keywords/Search Tags:HDFS, small file, Cut-off point, grey relational analysis
PDF Full Text Request
Related items