Font Size: a A A

Research On Data Duplication Selection Strategy In Cloud Environment

Posted on:2014-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y X AiFull Text:PDF
GTID:2268330422460762Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the appearance and development of the cloud computing, the technology ofcloud storage has developed rapidly. Traditional file system cannot meet the demand forthe cloud storage services, so distributed file system, as one of the cloud storage’s keytechnology, is important increasingly. At present, most widely used distributed file systemon the Internet are GFS of Google, HDFS of Hadoop, MooseFS and so on. Most storagenodes of these distributed file system are personal computers, which have lowerperformance. To avoid the phenomenon that these personal computers because of theunforeseen damage of hardware equipment, power outage, hacker attack, virus, fire,earthquake and terrorist attack damage and loss data, most of the distributed file systemadopt the method of data duplication, namely duplication technology, storing the same datain different storage node. The problem is which storage node is selected when clientaccesses the data resources of the distributed file system.Distributed file system GFS and HDFS use the method of computing distancebetween the client and the storage node to select the data duplication, which is the shortestdistance and then select this storage node to read data. MooseFS select the storage nodewhich has the least reading and writing times to access. GFS, HDFS and MooseFS arenetwork distributed file system, data storage node and client transfer data through thenetwork, so the size of the data storage server’s bandwidth directly affects the speed of theclient reading data. The size of bandwidth is proportional to the speed of the client readingdata. Data duplication selection algorithm on MooseFS, the IO load of each data storageserver is balanced, but each time client select the bandwidth of the data storage server isnot necessarily the best, and lead to slower speed that client reads data.This paper puts forward data duplication selection algorithm based on ant algorithmaccording to the bandwidth. Ant algorithm is a heuristic algorithm, and adjusts itselfthrough feedback information obtained from the dynamic interaction with the environment,and finally obtains the optimal solution. Ant algorithm has been widely used to many problems of optimum solution, such as TSP allocation, network routing, and taskscheduling and coloring problem and so forth. In MooseFS Distributed file system,selecting one of the best duplication is still optimization problem. Ant algorithm has theperformance of the positive feedback, collaboration and parallel. Furthermore, thescalability of ant algorithm can be fit to the dynamic random changed distributed filesystem. These features make the ant algorithm suitable for solving data duplicationselection of the distributed file system, so in theory, data duplication selection strategybased on ant algorithm is feasible. The experimental test results show that data duplicationselection algorithm based on the ant algorithm can improve the speed of the client readingdata, and reduce the time of the client accessing data.
Keywords/Search Tags:Cloud Storage, Distributed File System, Load Balancing, Ant Algorithm, Replica Selection
PDF Full Text Request
Related items