Font Size: a A A

Research On Key Technologies Of Anomaly Detection Based On Multi-Source Data

Posted on:2020-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:C WangFull Text:PDF
GTID:1368330596975722Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As an important research field of data mining,anomaly detection focuses on the analysis of different types and sources of data,so as to model the anomalous objects hidden in it.Aiming to accurately detect anomalous objects in the dataset,researchers have proposed models based on distance,density,and clustering etc.Due to its wide application in academia and industry,anomaly detection has attracted the attention of a large number of researchers in related fields,resulting in many classic models and methods.However,with the development of sensor networks and the advent of the era of big data,data in a large number of application fields has begun to show a multi-source trend.Data types are diversified,and the dependencies between data are complicated.The correlation and difference between data from different sources is a key issue to solve for anomaly detection based on multi-source data.Based on the interdependence of data,this thesis constructs a graph-based anomaly detection model and uses random walk method to analyze the anomalous nodes in the graph.On the other hand,this thesis also constructs a multi-view based anomaly detection model to characterize the correlation and difference between multi-source data.Anomaly detection is then performed by using the inconsistency of data in different views.In general,this thesis studies the anomaly detection of multi-source data from four aspects:(1)The traditional graph-based anomaly detection model tends to focus only on the nodes,the edges or the correlations between them when considering the anomaly of each sample,while ignoring the local neighborhood information of the samples.We propose an anomaly detection algorithm based on local information graph(LIGRW),which is an asymmetric weighted directed graph constructed on the dataset.A customized random walk process is applied to the graph,so that the random walker jumps from the node corresponding to the normal sample to the node corresponding to the abnormal sample with a large probability.At the same time,considering that the asymmetric relationship in the local information graph may cause the random walk process fail to converge normally.Based on the principle that the abnormal sample should be visited with higher chance,we propose two different types of restart vectors to ensure that when the process is restarted,the potential outlier nodes are selected with a large probability.(2)The results of the neighborhood-based anomaly detection model depend heavily on the choice of neighborhood parameters.In addition,the traditional anomaly detection models need specific proximity measures to calculate the similarities or distances between samples,which makes the corresponding models lacking of flexibility in facing different types of datasets.We analyze characteristics of the changes in the scores assigned to different samples based on the proximity graph model,and find that the score changes obtained by different types of samples when using different proximity graphs show different patterns.A detection model called Abnormal Pattern Scoring(APS)was constructed.The model can obtain better performance without parameter tuning.On the other hand,when characterizing the relationship between samples in different types of datasets,the model also has the flexibility to select the required proximity measure.(3)The correlation and inconsistency between the data from different sources makes the traditional detection methods difficult to adapt.This thesis uses multi-view to represent the data from different sources,and proposes an fuzzy clustering based consensus model for multi-view anomaly detection(FCC).The algorithm integrates the data corresponding to different views into an extended feature space,in which fuzzy clustering is used to calculate the membership vectors of the sample in the different views for the multiple cluster structures implied in the dataset.These vectors can effectively characterize the membership behavior of the sample in different views.The FCC algorithm marks the samples with different behaviors in different views as abnormal objects.Experiments on artificial and real world datasets verify the effectiveness of the algorithm.(4)The FCC algorithm focuses on the inconsistency of the behavior of the sample in different views when analyzing the anomalous objects in the multi-source data,while ignoring the samples that are seriously deviated in all views.To solve this problem,we propose a hybrid anomaly detection(LRRMOD)model based on low rank representation.The model first uses the dataset itself as the dictionary to learn the relationship between the data samples by using low rank representation,and then uses this relationship to construct a similarity matrix.Affinity Propagation clustering is performed on the similarity matrix corresponding to different views to obtain the cluster representative points corresponding to each sample.The deviation of the sample from the cluster center is defined as its attribute anomaly score,and the inconsistent behavior of the sample on different views is defined as the behavioral anomaly score.LRRMOD uses both attribute anomaly score and behavioral anomaly score to determine the final anomaly of the sample,which ensures that the model has better performance than an algorithm that uses inconsistencies only.Through the above research,we have theoretically contributed new solutions to the problem of abnormal detection of multi-source data.
Keywords/Search Tags:multi-source data, anomaly detection, local information graph, anomaly pattern scoring, fuzzy clustering
PDF Full Text Request
Related items