| With the development of computer technology,the scale and complexity of IT systems continue to expand,leading to a significant increase in the volume of operational data.Traditional operational methods are unable to meet the various demands brought by digital transformation across industries.In the scenario of fault tracing in system operation,the failure of one system can generate a large amount of alarm information related to it,requiring high levels of experience and significant time and effort to troubleshoot,leading to inefficient processing,along with enormous economic losses from the failure downtime.Therefore,it is necessary to explore effective operational data mining and analysis methods to extract potential operational knowledge from operational monitoring data,enabling intelligent operations and improving the efficiency of fault tracing while reducing economic losses caused by failures.Thesis aims to meet the requirement of fast and accurate system troubleshooting in IT system operations fault traceability scenarios,and divides fault traceability into two parts:alarm correlation analysis and fault root cause alarm positioning.The commonly used sequential pattern mining method in the alarm correlation analysis needs to be repeatedly updated when the system state changes or the operational data incrementally changes,which leads to wastage of system resources.Moreover,there are many statistically significant but invalid sequential patterns in the mining results,which are not efficient for fault root cause localization.In the fault root cause alarm positioning,there is a lack of labeled correlation between faults and alarms.Typically,such labeling is done based on a combination of alarm descriptions and experience judgment.Furthermore,the actual fault associated alarms in the production process are relatively small,and there is a need to set an efficient and feasible root alarm positioning process and evaluation coefficient to reflect the differentiation of multiple root alarms.To address these issues,thesis mainly focuses on the following research content:(1)Optimization of Incremental Sequence Pattern Mining for Alarm Data.The alarm data in IT systems exhibit incremental characteristics during operation.Using static sequence pattern mining will repeatedly scan the database to update frequently occurring patterns due to data updates.In thesis,an incremental sequence pattern mining algorithm is used for operational alarm data correlation analysis.A data structure called the semifrequent sequence tree is used to store the results of sequence pattern mining,which fully utilizes the mining results and semi-frequent buffer mechanism to reduce the number of projected databases during the update process.This approach efficiently completes the update process of frequent sequence patterns during changes in support or data increments.(2)Filtering Redundant Frequent Sequence Patterns.Based on the propagation and diffusion of alarms during abnormal IT system operations,the causal relationship between root alarms and associated alarms is used to filter redundant frequent sequence patterns.This improves the efficiency of semi-frequent sequence tree updates and the reliability of association rules.(3)Fault Root Cause Alarm Localization Based on Fault Correlation Graph Model.To meet the efficient application requirements of alarm correlation relationships in operational fault tracing scenarios and the characteristics of root cause alarms,a reasonable layer structure of the fault correlation graph model and root cause alarm localization algorithm process are designed.Additionally,an evaluation coefficient with some interpretability is used to experimentally verify the effectiveness and feasibility of the algorithm. |