Research On Consistency Problem Of Distributed System Based On Raft Consensus Algorithm

Posted on:2020-01-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z K Wang

Full Text:PDF

GTID:2428330602452137

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Replication state machine,as the primary fault-tolerant means of distributed system,is typically implemented using replication logs.This method maintains that multiple replication logs on a set of servers are identical to each other,and requires each state machine to execute the commands in the log in order,thus ensuring that each server produces the same state and tolerating a few server failures.The replication state machine method needs to use a distributed consensus protocol to ensure that multiple replication logs can agree on the next log entry,and that the data of multiple state machines are consistent with each other in case of failure and message delay.As the core component of the replication state machine,the consensus protocol needs to be implemented concisely and efficiently.However,the mainstream consensus protocols such as Paxos,ViewStamped Replication and Zab are difficult to understand in the algorithm itself,and need additional extensions or performance optimizations in engineering implementation,which is not easy to build the actual system.In addition,building a distributed service based on replication state machine to satisfy a given consistency model requires specific design according to service characteristics and application scenarios,and there is no universal solution.Based on the above problems,the research content of this paper is as follows:(1)This paper analyzes and compares the mainstream consensus protocols and chooses the Raft protocol as the core algorithm based on the requirements of understandability and ease of engineering implementation.However,most of the existing Raft protocol implementations are experimental in nature,lacking in integrity or correctness verification.A few available implementations are all part of a large service and not encapsulated into a common base library form.Therefore,this paper builds a simple and efficient Raft implementation from scratch,which strictly follows the Raft protocol specification and refines the implicit processing that is not clear enough in the protocol,so as to ensure that the implementation of this paper has high accuracy and reliability in various complex fault scenarios.Aiming at the problem that Raft algorithm can not reach agreement for a long time due to log conflicts of different nodes in the case of node crash or network partition failure,this paper proposes an accelerated log backtracking optimization scheme,which reduces the number of message communications needed to resolve hundreds of log entry conflicts by 20%,and can avoid the number of massages increasing linearly with the aggravation of log conflicts,so that it can be maintained in a stable range,and the backward nodes can be updated quickly,which ensures the efficiency and activity of Raft algorithm in these scenarios.(2)In order to verify the effectiveness of the Raft implementation in solving the consistency problem,this paper chooses the Key Value service,which is simple and easy to implement,as the research object of the consistency problem,to avoid introducing additional interference due to the complexity of the service type.Based on the Raft implementation,this paper continues to improve the interaction between Raft module and client,and builds a fault-tolerant distributed KeyValue service.Compared with the commonly used consistency model,aiming at providing simple and intuitive data characteristics,this paper chooses linearizability as the constraint model of Key Value service and ensure that the same client request is executed only once by performing repeated detection and processing on client requests,so as to ensure that the system conforms to the linearizability semantics.The experimental part introduces a distributed test framework,which fully tests the Raft implementation and the Key Value service by simulating various failure scenarios such as client concurrent requests,network partitioning,message transmission delay,loss,out-of-order,repetition and server crash.The high availability of service is verified.By using the open-source linearizability verification module Porcupine,we detect the history of concurrent client operation of the Key Value service under the condition of message delay discarding and network partition failure.The test results prove that the Key Value service is linearizable,and indirectly verify the correctness and reliability of the Raft implementation.

Keywords/Search Tags:

Distributed System, Fault-tolerance, Replication State Machine, Consensus Protocol, Linearizability

PDF Full Text Request

Related items

1	A Vector Clock Based Leaderless Consensus Algorithm
2	Design Of Optimized Log Replication Distributed Consensus Algorithm OLR-Raft Based On Raft
3	Research On Key Technologies Of Fault Tolerance Of Large Scale Distributed Simulation System
4	Research And Design Of Dynamic Byzantine Fault Tolerance Protocol
5	Research And Implementation Of Trusted Storage System Based On Byzantine Fault-tolerance Consensus Algorithm
6	Performance Analysis And Optimization For Virtual Machine Level Fault Tolerance
7	Research On Adaption Method Of Cloud Fault Tolerance Services Based On User Requirement And Resource Constriction
8	Effective Byzantine Fault Tolerance Algorithm Based On Structured Network And Trusted Execution Environment
9	Research And Implementation Of Group Communication System In Fault Tolerance Service
10	Research On Blockchain Consensus Mechanism Based On Byzantine Fault Tolerance