Font Size: a A A

Research On Key Technologies Of Replica Consistency For Cloud Storage

Posted on:2016-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y YangFull Text:PDF
GTID:1108330503953327Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the recent development of cloud computing paradigm, the importance and value of cloud storage has been widely acknowledged. As derivatives of distributed systems and databases, cloud storage systems adopt replication, which are derived from both domains and has been intensively studied and widely applied, as the necessary means to increase the level of availability, to enable a fast access through local replicas, and to guarantee fault tolerance. With the rapid evolution of enabling technologies and the improvement of application requirements, the architectures of cloud storage systems have evolved from single data center to multiple data centers across different geo-locations. In such large-scale environments, replication is faced with many new challenges and needs to be further perfected and improved.Insuring data consistency across replicas is a crucial issue when using replication approach. On the one hand, the stronger the consistency level, the higher the costs for enforcing it, and the lower the degree of scalability. On the other hand, weaker consistency is cheaper but at the cost of potentially consistency violations. As a matter of fact, there is not a ’one-size-fits-all’ consistency model for real world applications due to their diversity and dynamics of consistency demands. In this context, replication implies various tradeoffs among consistency and other factors, e.g. performance, cost. From the perspective of theoretical research and practical application, to achieve the highest possible performance in the case of keeping strong consistency level and to synthetically consider the tradeoffs among consistency and other factors are significant and challenging topics.This dissertation starts from the design approach of replicated state machines(RSMs) and their strong consistency protocols, and main works focus on developing a new strong protocol which provides high throughput in large-scale cloud environments. A quantitative analysis model is then developed as a complement to analyze this new protocol’s performance. In the last part of this dissertation, consistency tradeoffs are explored preliminarily. More specifically, the main contributions and innovation points of this dissertation are illustrated as follows:(1) Even though most of the current design results of replicated state machines and corresponding strong consistency protocols with different resource constraints or optimization objectives exhibit similar features, they usually provide their principles textually and redundantly. On the basis of in-depth investigation of typical designs, common characteristics of various RSMs are extracted. Through feature abstraction and property definition, this dissertation introduces a modular abstract framework for RSMs. The framework presents a faithful deconstruction of replicated state machine and decouples the design of different modules to some extent. With different module implementation, the framework can be used to straightforwardly and effectively reconstruct known and new RSMs. Furthermore, a skeletal protocol is also provided in this framework to specify and simplify the design of related strong protocols. This modular abstract framework is an effective complement to the traditional way and is of practical guiding significance for designing effectively design RSMs and strong consistency protocols under resource constraints or optimization objectives.(2) When providing strong consistency among replicas distributed across multiple data centers in the cloud, leader-centric protocol, such as famous Paxos, face two major problems: the high latency of wide-area network and the unbalanced link dependency pattern. Based on the idea of instances parallel executing, this dissertation defines a new RSM called hierarchical replicated state machine(H-RSM) and designs D-Paxos, a strong protocol which implements an H-RSM and provides high throughput. H-RSM and D-Paxos realize the efficient use of resources through batching and logical pipelining and provide high performance in large-scale cloud environment without sacrificing strong consistency. The dissertation has theoretically proved that D-Paxos satisfies the safety properties and liveness properties of H-RSM and can be used to implement an H-RSM. Moreover, H-RSM and D-Paxos can also be derived from our modular abstract framework, which offers another perspective on the practicability of our framework.(3) Performance evaluations for consistency protocols are usually done by means of experiments, which are also used for evaluating our D-Paxos’ s throughput, scalability and fault tolerance. Experiment is in its essence an important evaluation method, but it would be extremely helpful for gaining in-depth insight into the protocol if it can be combined with some quantitative analysis method. Therefore, this dissertation develops an analytical model for D-Paxos theoretically in order to gain a better understanding of how batching and logical pipelining impact D-Paxos’ s performance. The analytical model provides a good approximation for the size of pre-ordered batches of requests, which has been verified experimentally. Combined with the analysis based on the model, the relationship among D-Paxos’ s performance and related properties, e.g. latency, request sizes, batch sizes and the number of replicas, has been studied more detailed. The scalability of D-Paxos with batching and logical pipelining has been further discussed.(4) With data being replicated on a worldwide scale, the inherent consistency tradeoffs are accentuated due to the high communication latencies between data centers. Unlike schemes only for consistency-performance tradeoff or consistency-cost tradeoff, an effective scheme for the prickly tradeoff among consistency, cost and response time is proposed in this dissertation for geo-replicated cloud storage systems. In this scheme, a group-based replicated framework and related tradeoff strategies are designed in a middleware layer, which allow geo-replicated cloud storage systems to adaptively switch to an appropriate consistency level at runtime in the consideration of cost and timeliness. With primary/secondary grouping and the combination of active and passive replication, the group-based replication framework provides strong consistency level and weaker consistency levels. The probabilistic approachs used in cost strategy and performance strategy determine the target consistency level and target replicas for subsequent requests by using the performance history collected at runtime, which achieves tradeoffs among consistency, cost and response time.
Keywords/Search Tags:Large-scale cloud storage, replicated state machine, strong consistency protocol, performance quantitative analysis, consistency tradeoffs
PDF Full Text Request
Related items