Font Size: a A A

Research On Distributed File Placement Algorithm Without Depending On Popularity Information

Posted on:2019-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z W TianFull Text:PDF
GTID:2428330566491425Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the background of big data,the storage and management of mass data increasingly depends on the distributed file system,and the file placement algorithm is one of the important factors that affect the overall performance of the distributed file system.In recent years,researchers have studied the problem of file placement in distributed file system according to the access popularity of files,and made certain achievements.While,the popularity information of accessing files is an uncertain and dynamic value,and its access popularity is unknown when the file is being stored.This paper focuses on this issue and studies the file placement of distributed file system.Firstly,this paper studies the distributed file system.By analyzing and studying the relevant source code of HDFS(Hadoop Distributed File System)file placement algorithm,this paper extracts the default file placement algorithm model of HDFS distributed file system;By analyzing and verifying data storage results using Hadoop cluster,this paper points out the shortcomings of the default file placement algorithm of the HDFS distributed file system.Secondly,in response to the uncertainty of file access popularity information,this paper proposes a distributed file placement algorithm,being called WDFPA(Distributed File Placement Algorithm Without Depending on Popularity Information)algorithm,which does not depend on file access popularity information.By analyzing the access life of a file,it is found that there is a strong correlation between the file creation time and the file access popularity.Therefore,according to the law of file access distribution,this paper divides the time using the method of exponential function,and determines the time interval that the file belongs to through the creation time of the file,and finally places the file according to its own time interval.Lastly,this paper proposes a dynamic replica management strategy based on WDFPA algorithm.Considering the identity that the file access popularity will be different if the file belongs to different time intervals,this paper sets the files in different time intervals for different file replica levels,and then dynamically changes the corresponding files replicas with different time intervals according to the storage load of the distributed file system,finally achieving the goal of dynamically adjusting the file system storage load.The experimental results show that the file placement algorithm proposed in this paper can achieve load balancing of each node in distributed file system,and can enhance the ability of access load balancing of each node.Moreover,the dynamic replica management strategy proposed in this paper can dynamically change the number of replicas according to the overall storage load of the file system,so as to realize the adjustment of the file system storage load.
Keywords/Search Tags:distributed file system, file access popularity, file placement, load balance, replica management strategy
PDF Full Text Request
Related items