Font Size: a A A

Research On Fault-Tolerant Storage Technology In Cloud Computing Environment

Posted on:2012-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Q PeiFull Text:PDF
GTID:2218330362460129Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cloud computing, which has drawn significant attention from both of the academia and the industry, is playing an increasingly important role in the scientific computation and the business computation fields as a new network computation model. Data center is an essential component for the cloud computing, which has abundant numbers of machines and customers, and stores more and more important data. As a result, how to store the"big data"with good efficiency and high fault tolerance has become a hot topic recently. Currently, researchers have proposed various kinds of data centers and data redundancy methods. However, little work has focused on how to efficiently handle the node failures in the storage, which is quite common in the data center. Consequently, improving the fault tolerance of data storage with respect to the node failures becomes quite useful. Our work aims to tackle such problems, by studying three complementary problems, i.e., the design of fault tolerant data center network topology, the repairing strategy for the erasure coding and the design of the data placement scheme.Firstly, according to the problem of improving the fault tolerance, the scalability and the throughput in the data center, a novel data center network topology DCUBE and its corresponding routing algorithm are proposed. We add redundant connections between servers in order to improve the fault tolerance of topology. We next propose an efficient routing algorithm. Then in order to improve the scalability of DCUBE, the modular construction method is presented. Furthermore, in order to advance the network throughput, a multiple parallel path routing algorithm is proposed. Analysis and Simulation results show that the fault tolerance, scalability and the network throughput are improved compared to classical data center network topologies such as the DCell and BCube.Secondly, in order to improve the fault tolerance that is damaged by the long repair time of the current erasure coding methods, a tree-based parallel repair algorithm TPR is proposed. TPR constructs multiple trees in case of node failures that are used to recover the nodes in a parallel way, so that the repair time is reduced. Furthermore, TPR improves the successful repair rate by making use of the optimal paths that have high bandwidths. We also introduce a scheme to optimize the data availability by maximizing the utilization of available bandwidth for reducing the probability of new node failure. Simulation results show that the repair time, the successful repairing probability and the failure rate are improved compared to the serial repair algorithm and the cooperative repair algorithm.Thirdly, targeting at improving the fault tolerance in the data placement in the data center, a fault tolerant data placement algorithm FDPA is proposed. FDPA stores the data that has a high access frequency into the nodes that have short access delays to other nodes, so that the mean time to fail of the data object is decreased. Furthermore, FDPA puts multiple data blocks of the same data object into different nodes by a threshold of the storage capability, in order to reduce the influence between the blocks of the same data object. Simulation results confirm that the average access delay of FDPA is reduced, and the mean time to failure is increased, comparing to the random placement strategy and the CRUSH method.
Keywords/Search Tags:Cloud Computing, Fault Tolerant Storage, Data Center, Erasure Coding, Data Placement
PDF Full Text Request
Related items