| Nowadays,large-scale software systems are widely used in various industries due to their diverse services,and the quality of services greatly affects user experience.However,various abnormal events that occurred in software systems lead to poor quality of services.System operation & maintenance management is committed to predicting abnormal events and restoring the service quality of the software system.Therefore,reliable operation & maintenance management for large-scale systems is extremely important to ensure the stable service of systems.Disk failure is the primary problem that affects the service quality of systems.By capturing abnormal information in disk status data,disk failure can be effectively predicted,and the data stored in disks can be timely backed up to reduce adverse effects on service quality.System log is important information for tracking and recording the operating status of software systems,and administrators can detect abnormalities in the system by analyzing the data of log sequence.However,the concurrency of large-scale systems causes the order of log sequences to be chaos.In addition,the complexity of large-scale systems causes humans can not efficiently perform system operation & maintenance management.Therefore,to enhance the automation and intelligence of the system operation & maintenance management,this paper builds an anomaly event detection platform for large-scale software systems that are driven by disk status data and system log data,and the core of which is based on disk failure prediction method and log anomaly detection method,significantly reducing the cost of manual operation & maintenance management.The main contents of this paper are as follows:In the method of disk failure prediction,in order to fully mine abnormal information in disk status data,this paper considers the inconsistency in data distribution between health status and abnormal status of damaged disks as an abnormal feature and uses siamese neural network to capture the abnormal feature to achieve high performance in disk failure prediction.In the method of log anomaly detection,in order to solve the problem of disordered log sequences,this paper introduces a permuted event modeling method to reduce the sequential properties of log sequences.Meanwhile,this paper proposed a generative adversarial network based on the attention mechanism to reduce the influence of disordered log sequence with sequence insensitivity of the attention mechanism.To complete automatic system operation & maintenance management,this paper builds an anomaly event detection platform for large-scale software systems based on the above methods.The platform uses historical operation & maintenance data to promote automation and intelligence of operation &maintenance management for large-scale systems. |