Font Size: a A A

Research On Syslog Anomaly Detection Based On Three-way Decision Incremental Clustering

Posted on:2022-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y TanFull Text:PDF
GTID:2518306536454564Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the application of modern large-scale distributed systems,its system security services are facing huge challenges.Since modern large-scale systems support all-weather online operation,any abnormal events that occur in these systems may cause the system to collapse and cause huge economic losses.In order to detect these abnormalities that hinder the normal operation of the system in time,it is necessary to perform anomaly detection on the system.System logs are the main data source for system anomaly detection.With the increase of log complexity,the system log anomaly detection method that uses clustering technology as the data-driven core has been extensively researched and developed.However,there are still two problems in the log anomaly detection method based on clustering.Firstly,most standard clustering techniques adopt batch learning mode.As the log scale increases,the time cost and update process of using clustering for anomaly detection become expensive,so the anomaly detection method based on clustering needs to be improved.Secondly,the current related research seldom considers the problem of overlapping-cluster domains.Since the mainstream clustering method is based on the idea of two-way decision.when the information obtained is not sufficient or the difference between objects is low,it is easy to produce the phenomenon of overlapping-cluster domains,which leads to a large number of decision errors in anomaly detection because the model cannot clearly define the boundaries of the class domains.In view of the above problems,the main research contents of this paper are as follows:(1)In order to solve the problems of log clustering and model updating in batch learning scenarios,an incremental clustering log clustering algorithm(IClustering)is proposed.In this method,the log set is divided into several data blocks for incremental processing,and the subsequent data can perform a single traversal on the original knowledge system to achieve gradual update of the anomaly detection model through the construction of the guidance process,without the need to recalculate historical data,thereby greatly reducing the cost of log clustering calculation and model update.The experimental results show that,compared with the other two traditional clustering algorithms,IClustering can effectively solve the problems of clustering difficulty and the model update complex in batch learning mode.(2)Aiming at the problem of decision-making errors caused by the phenomenon of overlapping-cluster domains in the clustering model based on the two-way decisions,it is proposed to introduce the three-way decisions theories into the anomaly detection model.In this method,the boundary domain sample data is calculated from the perspectives of the inside and outside in the cluster.By adopting different decision branches for sample data in different regions,the occurrence of wrong decisions is effectively reduced.Experimental results show that the introduction of three-way decisions can effectively reduce the occurrence of wrong decisions caused by overlapping-cluster domains in the anomaly detection model.(3)In view of the large computational complexity of the TWD-ICM model when searching for boundary samples in the model training stage,the parallel algorithm of TWD-ICM based on the Spark is proposed.By combining the characteristics of RDD,the core ideas and operation steps of TWD-ICM parallel algorithm on different computing layers are given,including data partition layer,intra-cluster boundary sample calculation layer,and inner-cluster boundary sample calculation layer.Experiments verify the effectiveness and feasibility of TWD-ICM parallel algorithm on Spark.
Keywords/Search Tags:Anomaly detection, Incremental clustering, Three-way decisions, Parallel computing
PDF Full Text Request
Related items