Font Size: a A A

Design And Implementation Of AIOps Root Cause Analysis System Based On Knowledge Graph

Posted on:2022-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:A Y LuoFull Text:PDF
GTID:2518306740495214Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of virtualization technology,distributed systems are becoming larger and more complex.When the system fails,it is difficult for the on-call personnel to find the root cause in a short time,and the system will be in an unstable state.Traditional operation and maintenance is becoming more and more difficult,and artificial intelligence for IT operations(AIOps)is put on the agenda.Finding out the root cause of faults in time and ensuring the safe and stable operation of the system are the most basic functions of AIOps.Therefore,it is very important to study the root cause analysis method of distributed system,find out the root cause of system failure and ensure the safe operation of the system.Recently,more and more researchers have begun to pay attention to these problems.The methods based on dependency graph and causality graph have achieved some good results in dealing with the problem of fault root cause analysis of complex systems.These methods have two limitations: 1.They have not studied how to use explicit knowledge hidden in historical data to guide current root cause analysis;2.They cannot completely explain the fault triggering path at the event level.Based on this,this thesis constructs a fault knowledge graph for each type of fault,proposes a fault root cause analysis method combining graph convolution network and knowledge graph,and designs and implements a root cause analysis system based on knowledge graph.The main contributions of this thesis include:1)This thesis constructs a fault knowledge graph for each type of fault.In this thesis,the relationship between events is mined from the historical fault data of distributed system,and the fault propagation graph is constructed for each fault.Then,the similar structure of the similar fault propagation graph is obtained by using the merging algorithm,so as to construct the fault knowledge graph of each type of fault.Finally,related experiments are carried out to test and evaluate the construction method of fault knowledge graph.2)This thesis proposes a fault root cause analysis method combining graph convolution network and knowledge graph.In this thesis,the online fault propagation graph is constructed in real time,and the similarity between online fault propagation graph and fault knowledge graph is compared by using graph convolution neural network to determine the actual fault type.Finally,the corresponding relationship between fault and abstract root cause event is used to locate specific root cause events by ranking strategy.In this thesis,experiments are carried out on the distributed system on Alibaba Cloud platform,and several groups of comparative experiments are set up to verify the effectiveness of the proposed method.The experimental results show that the proposed method is superior to other root cause analysis methods in precision.3)This thesis designs and implements a root cause analysis system based on knowledge graph.In this thesis,a fault root cause analysis system is designed and implemented based on the above-mentioned methods.After testing,the system meets various functional and performance requirements.In summary,aiming at the problem of fault root cause analysis in AIOps,this thesis constructs a fault knowledge graph,proposes a fault root cause analysis method based on knowledge graph,and finally implements the corresponding system.Compared with existing systems,this system can further improve the precision of fault root cause analysis.
Keywords/Search Tags:Root Cause Analysis, Knowledge Graph, Graph Convolution Network, AIOps
PDF Full Text Request
Related items