Font Size: a A A

Research On Key Technologies For The Distributed Data Consistency Problem

Posted on:2017-10-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:H F WeiFull Text:PDF
GTID:1318330512454070Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Since the beginning of the 21st century, large-scale distributed systems and cloud computing are becoming increasingly widespread. To satisfy the requirements of such a new platform and computing paradigm for performance, availability, fault-tolerance, and scalability, the underlying data storage systems often employ the distributed data technologies, including data partitioning and data replication. However, this has brought the distributed data consistency problem:How should the (upper-layer) applications understand the data that are distributed rather than centralized? What does "consistent data" mean? How should the applications use distributed data easily and correctly, just like the way they use (centralized) shared data? Technically, in what order should the applications observe the concurrent updates on the underlying distributed data? Fur-thermore, how to write and reason about programs given some ordering properties? Being a middleware, distributed shared data service aims to provide the upper-layer applications with an abstraction of shared data on top of the underlying distributed data. Because of the inherent tradeoffs of which data consistency lies at the heart, no one-size-fits-all solutions to the distributed data consistency problem exist. As a result, the distributed data consistency problem has become a challenging research topic in distributed shared data service.From the historical point of view, data consistency problem is not peculiar to the fields of distributed systems and cloud computing, and instead, the study on it origi-nated in the early age of multiprocessor systems and parallel computing. However, the traditional theory of data consistency which is "program-oriented" and "correctness-emphasized" cannot fit well with the increasingly prominent values of applications on the new platform and computing paradigm. On the one hand, different applications or even different entities in the same application have different requirements for data con-sistency. A data consistency theory thus needs to integrate data consistencies of various levels, and even to integrate both consistent and inconsistent data/states. On the other hand, the correctness criteria for "consistent" data have become much more flexible. A data consistency theory thus needs to interpret data consistency as a spectrum rather than a binary choice between "consistent" and "inconsistent", in order to meet the re-quirements of applications for more refined, quantified data consistency. To embody the application values above, we propose an "application-oriented" research concept of the data consistency problem, summarized as the slogan "diversified-and-tunable; refined-and-measurable". The "diversified-and-tunable" part requires that a data con-sistency theory support the requirements of applications for diversified data consistency and allow applications to dynamically choose or tune the consistency levels at runtime. The "refined-and-measurable" part requires that a data consistency theory support the requirements of applications for refined data consistency and provide the applications with quantitative information about the quality of the data consistency service.These research problems have posed the challenges in terms of consistency mod-els, consistency mechanisms, and consistency measurements. Specifically, how to for-mally define "diversified" consistency models, and how to achieve dynamically tunable consistency in as general system architectures as possible? How to formally define "re-fined/quantified" consistency models, and how to efficiently verify consistency models or to mathematically quantify consistency models? In this dissertation, we focus on these challenges and make the following main contributions:1. We have first analyzed both the history and the trends of the researches on the data consistency problem. We realized that application values have become more and more prominent in large-scale distributed systems and cloud computing. To embody the application values, we propose an "application-oriented" research concept of the data consistency problem, summarized as the slogan "diversified-and-tunable; refined-and-measurable". We also propose a research idea called "one basis, three dimensions":Our research work is based on data types (i.e., read/write registers and transactions) and takes into account the dimensions of consistency models, consistency mechanisms, and consistency measurements. The "diversified-and-tunable" part manifests itself in the dimensions of consis-tency models and consistency mechanisms, while the "refined-and-measurable" part in the dimensions of consistency models and consistency measurements.2. We propose and solve the problem of Verifying Pipelined-RAM Consistency (VPC, for short) over traces of shared read/write registers. Specifically, we study four variants of VPC, namely VPC-SU, VPC-MU, VPC-SD, and VPC-MD, according to (1) whether there are Multiple shared variables (or one Single variable); and (2) whether write operations can assign Duplicate values (or only Unique values) for each shared variable. We prove that the VPC-SD variant (along with the VPC-MD variant) is NP-complete. We also present a polyno-mial algorithm for the VPC-MU variant (along with the VPC-SU variant). The polynomial algorithm can be used to test whether a system has correctly main-tained Pipelined-RAM consistency, while the NP-completeness result helps to further understand the complexity of weak consistency models.3. We propose the notion of "almost strong consistency" on read/write registers as an option for the consistency/latency tradeoff. It provides both deterministi-cally bounded staleness of data versions for each read operation and probabilistic quantification on the rate of "reading stale data", while achieving low latency. We investigate almost strong consistency in terms of probabilistically-atomic 2-atomicity (PA2AM):We formally define PA2AM, design maintenance protocol for PA2AM and prove its correctness, and quantify the rates of atomicity vio-lations incurred in this protocol. PA2AM not only has the statistically "almost strong" feature with respect to strong consistency, but also shares the perfor-mance advantage of weak consistency such as eventual consistency,4. By extending snapshot isolation (SI, for short), we propose a new transactional consistency model:Relaxed Version Snapshot Isolation (RVSI, for short). RVSI can formally and quantitatively specify the anomalies it may produce with respect to SI. To this end, we decompose SI into three independent "view properties", for each of which we then introduce a parameter to quantify one of the three kinds of possible anomalies:k1-BV, k2-FV, and k3-SV. We implement a prototype dis-tributed transactional key-value storage system which achieves RVSI and allows each individual transaction to dynamically tune its consistency level at runtime. Furthermore, RVSI provides an avenue for deeply exploring SI. The preliminary experiments show that RVSI helps to reduce the transaction abort rates when applications are willing to tolerate certain anomalies, and the reduction degrees vary according to workload types.
Keywords/Search Tags:Distributed data, data consistency, read-write registers, transactions, diversified-and-tunable, refined-and-measurable
PDF Full Text Request
Related items