Font Size: a A A

A Study Of Anomaly Detection Algorithm Based On Self-representation

Posted on:2022-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:E H LiFull Text:PDF
GTID:2518306530473414Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Anomaly detection is an important research field in data mining,and its goal is to detect abnormal data that deviates from expected behavior.Due to its wide application in many fields including credit fraud and network intrusion,anomaly detection has aroused the interest of many scholars at home and abroad,and has become a hot research topic,resulting in many classical anomaly detection models and methods.However,with the advent of the big data era and the rapid development of sensor technology,the dimensionality of data is increasing and the amount of data is exploding.Anomaly detection faces a difficulty that requires more efforts to be made:all pairs of data points in high-dimensional space tend to be almost equidistant.This makes it particularly difficult to obtain the similarity information of the data.This paper focuses on the above problem of anomaly detection and carries out research work from obtaining high-quality similarity information.Specifically,it proposed two anomaly detection algorithms based on the similarity of data,including kernel preserving embedding based method and reweighted nonconvex nonsmooth rank minimization based method.This paper first used two self-representation based methods to automatically learn similarity informa-tion,then constructed a graph based on the similarity information,and finally executed Markov random walk on the graph to analyze the abnormal data in the graph.Experimental results show that these methods effectively improve the performance of anomaly detection.The main work and achievements of this paper are summaried as follows:1.We proposed an anomaly detection algorithm with kernel preserving embedding.The method first automatically learned similarity relations between data through low-rank based self-representation method.In the process of learning similarity information,the typical self-representation method only concern to minimize reconstruction errors without involving the structural information of data.However,it is beneficial to preserve the manifold structure in-formation,especially the overall relationship,when obtaining similarity information.In this paper,we introduced kernel preserving embedding technology to retain as much structure in-formation as possible.Further,we adopted double nuclear norm to better approximate the rank function.On the basis of the similarity relations,each data point in the data set was regarded as a vertex in the graph,and a weighted directed graph was constructed.On this graph,a customized random walk was applied to identify abnormal data.2.We proposed an anomaly detection algorithm based on nonconvex rank minimization.For the problem that the convex nuclear norm as the low-rank matrix relaxation often leads to suboptimal solution,this paper proposed to use nonconvex and nonsmooth low-rank repre-sentation to fully reveal the intrinsic structure information and introduced adaptive reweighted strategy to obtain more accurate low-rank property when learning similarity information.Mean-while,the kernel preserving embedding was combined with the adaptive reweighted noncon-vex nonsmooth rank minimization to automatically learn similarity information.Then we con-structed a weighted directed graph based on similarity relationship,and added a virtual vertex connected to each point with two virtual edges on this graph to form a strong connected graph.Finally,under the principle that potential anomalies should be visited with less weight,weighted random walk was performed on the strong connected graph.After reaching equilibrium,the sta-tionary distribution was utilized to deduce the anomaly score of each data point.The higher the anomaly score,the more likely it is anomalous data.Comparative experiments with different algorithms on multiple public data sets show the effectiveness and superiority of the proposed method.
Keywords/Search Tags:anomaly detection, low-rank learning, kernel preserving embedding, non-convex optimization, random walk
PDF Full Text Request
Related items