Font Size: a A A

Research On Distributed System Fault Detection And Anomaly Detection Technology

Posted on:2022-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:C H HuangFull Text:PDF
GTID:2518306569459164Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of distributed technology,distributed systems have been applied to more and more fields.For distributed systems in fields closely related to people such as finance,medical care,and the Internet,once a failure occurs,it will affect many users and cause huge losses.In order to be able to reduce the impact on users and avoid greater losses,it is necessary to detect the faults and anomalies in the system in time and efficiently.Therefore,this thesis mainly studies the fault detection and anomaly detection technology in the distributed system.The main work is as follows:First of all,in order to improve the availability of the heartbeat detector and prevent "single point problems",this thesis uses the detector cluster method to detect the heartbeat of the services in the system.The detector cluster needs to elect a Leader node to coordinate the entire cluster.This thesis uses the idea of dynamic voting to improve the master-slave election algorithm in Raft.It can determine the number of votes based on the active members of the cluster,has faster convergence speed and higher availability during elections,and avoids the "split brain" problem.Secondly,this thesis uses a combination of unsupervised algorithm and supervised algorithm for anomaly detection.The entire anomaly detection process is divided into two stages.In the first stage,five unsupervised algorithms detect sample points at the same time,and vote to select suspected anomalies;in the second stage,the supervised algorithm Light GBM(Light Gradient Boosting Machine)detects suspected point for a second test.Finally,this thesis compares the adopted anomaly detection algorithm with several classic anomaly detection algorithms.Experimental results show that the anomaly detection algorithm adopted in this thesis based on the combination of unsupervised and supervised is superior to several classic anomaly detection algorithms in terms of accuracy,recall and F1 score.Finally,this thesis designs a prototype system for fault detection and anomaly detection,which divides the system into three modules: user client,heartbeat detector and heartbeat detection development library.The coupling between the modules is low and the scalability is strong.
Keywords/Search Tags:Distributed System, Fault Detection, Anomaly Detection, Master-slave Election
PDF Full Text Request
Related items