Font Size: a A A

Research And Implementation Of An Anomaly Detection Platform For Large-scale Software Systems Based On Large Collections Of Log Messages

Posted on:2018-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:J LuFull Text:PDF
GTID:2348330518999388Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the growing popularity of cloud computing technology,large-scale systems often consist of hundreds of software components running on thousands of computing nodes.Execution anomaly detection is very important for development,maintenance and performance tuning in large scale distributed systems.Their runtime data are continuously collected and stored in log files,as a result,console logs produced by these systems are often the significant source of troubleshooting and problem diagnosis.However,manually inspecting system logs to detect anomalies is unfeasible due to the increasing volume and complexity of log files.Therefore,this is a substantial demand for automatic anomaly detection approaches based on log analysis.Based on the above research problems,we implemented the anomaly detection platform based on massive log messages in this paper.The platform solves the problem of the management of massive log files.Moreover,the platform solves the challenge that it is difficult to detect and locate anomalies rapidly in the high complexity system.The platform includes five modules: dashboard,log management,application management,log retrieval,and anomaly detection.Anomaly detection is the core module of this paper.In the anomaly detection module,we propose a general method that does not require manual intervention to detect system runtime problems by mining the log information.We experiment on a small scale Cloud Stack system and a Hadoop production system.The result shows that our method can effectively detect running anomalies in comparison with existing three detection algorithms of principal component analysis,sampling algorithm and clustering algorithm.The implementation of the anomaly detection algorithm mainly includes the following four parts:1.Source code analysis.This step takes the program source code as input.We extract the set of log statements in source code by the method of AST,and generate the reachability graph to reveal the reachable relations for any two log statements.The log template is a structured definition of the log print statement.2.Log parsing.First,each log message is parsed to get its line number,time stamp,event level,provenance and text content.After that,the log file is parsed to create log messages by abstracting valid information and combining information with log templates.3.Execution trace extraction.The purpose of this step is to differentiate a message set consist of log messages into many execution traces.To deal with this problem,we propose an execution trace extraction algorithm to partitions the log messages.First of all,we extract the different traces according to the reachable relations revealed in reachability graph combine with the set of log messages.Secondly,we define the similarity standard and sort fragments with the same basic segment but different repetition into the same type.4.Anomaly detection.We propose a novel anomaly detection algorithm which based on a probabilistic suffix tree.In this step,we first define trace anomaly index for each trace and then considers traces as sequence data and to determine whether there is abnormal in the sequence data according to the similarity among the entire network.And then calculate the anomaly index combined with the structure and number of traces.
Keywords/Search Tags:Large-scale Software Systems, Anomaly Detection, Source Code Analysis, Log Parsing, Execution Trace Extraction
PDF Full Text Request
Related items