Font Size: a A A

Research On Robust Anomaly Detection And Its Interpretability

Posted on:2022-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y YeFull Text:PDF
GTID:2518306764467044Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,massive data are generated in all walks of life all the time.How to analyze and mine these massive data is an essential problem.Anomaly detection is a research hotspot in the field of data mining.Its goal is to find objects whose behavior is significantly different from that expected in the dataset,i.e.anomalies.Anomaly detection technology plays an important role in the security of network communication,financial risk control,industrial production and other fields.However,the data collected in the real scene often contains a lot of noise,which makes the anomaly detection model unable to learn the real normal mode of the data,and finally damage the accuracy of the model.In order to solve this problem,this paper deeply studies the robust anomaly detection models in different noise scenes and the interpretability of anomaly detection results.The main contents and innovations of this paper include the following three aspects:Firstly,aiming at the problem of unreliable labels on time series data,an anomaly detection framework for multi-variate time series based on multi-instance learning,namely MILAD,is proposed.MILAD utilizes recurrent neural network to capture the time dependency in time series and extract discriminant representations,and uses multi-instance learning loss to make the model aware of the label noise on normal data.And the experimental results on real-world time-series dataset with label noise show that the proposed algorithm has supreme ability of anomaly detection and early warning,which is of great practical significance.Secondly,aiming at the problem of unreliable data,that is,a large amount of noise is in the data,a self-supervised anomaly detection algorithm based on density estimation is proposed in this paper.The algorithm utilizes density of data as self-supervision information,models the normal distribution and abnormal distribution respectively,and calculates the anomaly score through Bayesian formula.According to different density estimation methods,the algorithm can be further divided into distribution modeling based on global density estimation and distribution modeling based on density synchronous learning.The final results show that the self-supervised anomaly detection based on density estimation can achieve competitive results,and the results are relatively stable,which solves the problem of unreliable data.Thirdly,to cope with the poor interpretability of current anomaly detection model,a method for the interpretation of anomaly detection results based on tail probability is proposed.This method models the joint distribution of data based on Copula function,calculates the tail probability of samples in a specific subspace and takes it as the anomaly score of the subspace,that is,COPOD-Z score,so as to find out the subspace with the highest anomaly score and enable the interpretability of the results.The final results show that this score-search method based on COPOD-Z score can fully and reasonably explain the anomaly detection results,and achieve a good balance between algorithm effectiveness and running time.
Keywords/Search Tags:Data Mining, Anomaly Detection, Robustness, Interpretability
PDF Full Text Request
Related items