Font Size: a A A

Research On Low Load Failure Detection In Distributeed System

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y DengFull Text:PDF
GTID:2518306461452744Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of computer technology,distributed system begins to enter a period of great strides in development,the scale of distributed system is bound to be larger and larger.The application in the medical,aviation,cloud computing,Internet of things and other industries also has higher requirements for the availability and reliability of the distributed system,and failure detection mechanism is one of the means to maintain the high-performance operation of the distributed system.In the current distributed system,the timeout obtained by counter is discrete under the condition of low load failure detection,and the network fluctuation in recent period is not considered,which makes the performance of failure detection unstable.And most of the failure detection ignore the influence of link failure in the network,and treat link failure as node failure.This paper will focus on two aspects of timeout prediction and link failure to conduct more accurate failure detection research,in order to cope with the current expanding scale and increasingly complex network environment of the distributed system,mainly including the following innovations:Firstly,this paper analyzes the related technologies of traditional distributed failure detection in detail,and explains the situation of low load failure detection in detail.It points out that the disadvantages of the existing detection scheme applied to low load distributed system lie in that the setting of detection timeout is only a range of several determined values changing back and forth,and the selection of the best timeout is too limited,resulting in detection time Longer.In this paper,the index weighted moving average algorithm is used to smooth the round-trip time,predict the first detection timeout of failure detection,and change the time-out settings of failure re detection.A DSS failure detector based on the round-trip time prediction of the first detection timeout is proposed.After verification,DSS can reduce the detection time by 28% on average under similar network load,and improve the accuracy of failure detection by 11%.Secondly,this paper further studies and finds that there is a significant difference between the round-trip time of the first detection process and the re detection process,and the round-trip time of the second detection is significantly greater than the round-trip time of the first detection.If the timeout continues to be based on a single timeout setting,the time of failure detection will be inaccurate,which will increase the false positive rate of failure detection and affect the performance of failure detection.Therefore,in this paper,two sliding windows are set for the two stages of failure detection,and a dual window sliding twds failure detection method is proposed.Distinguish the round-trip time obtained,predict the first detection timeout and re detection timeout respectively,and then design the failure detection interval according to the predicted timeout,and then provide the basis for the judgment of suspected timeout.According to the experiment,twds can reduce detection time by 39% and false positive rate by 22%.Thirdly,most of the current failure detection will ignore the link failure situation,which is simply equivalent to node failure,affecting the system availability of the system.In this paper,a link fauilure detection protocol TNFD is proposed to detect node failures and locate link failures in the system.Through experiments,TNFD reduces the detection time by 38% and can accurately distinguish between node failure and link failure.After studying the low load failure detection,this paper starts from the two aspects of failure detection timeout mechanism and link failure.The former changes the dynamic time-out setting mechanism of low load failure detection,and the latter sets up multi node detection,which accurately distinguishes the fault types in the system,and provides a reference for the design of low load failure detection method,and also for the design of timeout prediction Some ideas are provided to distinguish failure types.
Keywords/Search Tags:Distributed System, Failure Detection, Low Load, Node Failure, Timeout Prediction, Link Failure
PDF Full Text Request
Related items