Font Size: a A A

Research On Outlier Detection Methods In Heterogeneous Networks

Posted on:2018-04-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:1318330515976117Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Heterogeneous information network is ubiquitous.Mining outliers hidden in heterogeneous information network,which are deviated from normal data,is one of the important tasks in data mining.Outlier detection has widely applications in many fields,such as intrusion detection,fraud detection,predicting terrorist attacks or suspicious events,and data de-noising and so on.Researches show that compared with mining periodic data,mining outliers deviated from normal objects could probably provide users with more valuable information.This paper delves into the techniques used in outlier detection from the perspective of outlier detection in static networks,outlier detection in dynamic networks and mining outlier pairs,etc.The specific researches are summarized as follows.1)A meta-path based outlier detection method is proposed for static heterogeneous information networks.The closeness degree between the source objects and the features of target objects is obtained through analyzing link relationship in network structure.Then,the semantic similarity between objects can be calculated.The meta-path and the closeness degree between objects are combined to measure the reachable probability between different types of objects.Finally,the outlierness degree between any two nodes can be computed through applying reachable probability and path length.In addition,each node is assigned with a reliability weight to improve the accuracy.A real dataset and a simulated dataset are used to validate the proposed method.Experimental results show that this method can effectively detect outlier in static networks under the premise of combining the semantic information of nodes.2)Clustering,as one of the most important information retrieval methods in data mining,has broad applications in outlier detection in heterogeneous networks.When new data are inserted,most traditional clustering methods need to re-compute the whole dataset instead of updating part of the data incrementally.To solve the problem above,this paper proposes an incremental bottom-up clustering method and applies it in the process of outlier detection in dynamic heterogeneous networks.Before clustering,each node is treated as a single cluster.Furthermore,a new metric,called CV(comparison variation),is defined to iteratively judge whether two closest clusters should be merged or whether the existing cluster should be split when the clusters change.Also,the strictness degree of clustering can be controlled dynamically by adjusting a parameter ? in CV.This metric does not need to determine the number of clusters in advance.It can determine the most appropriate number of clusters dynamically according to different sizes of datasets and different data quality.Experimental results show that the proposed clustering method can conduct clustering effectively and update data incrementally.3)A tensor representation based outlier detection method is proposed for dynamic heterogeneous information networks.This method constructs a tensor index tree based on high-order data represented by the tensor.The features are added to the direct item set and indirect item set by searching the tensor index tree.It can detect outliers in the networks dynamically through judging whether the data object in the dataset deviates from its original cluster according to the short text correlation based clustering method.This model can keep the semantic information in heterogeneous networks under the condition that the time complexity and space complexity are reduced substantially.Experimental results show that this method can detect outliers effectively and efficiently in dynamic heterogeneous networks.4)In order to delve into and analyze the influence brought by the difference between link structure similarity and semantic relationship similarity in heterogeneous networks,an outlier pair detection method is proposed based on the difference between link structure similarity and semantic relationship similarity.First,the link structure similarity matrix and the semantic relationship similarity matrix of the target objects are constructed.We analyze the link structure and semantic relationship to obtain the similarity between objects.The link structure similarity is calculated by considering the target objects' structural correlations.The semantic relationship similarity is calculated by using k-step index algorithm to get the feature representation of target objects.Finally,the linear transformation of matrix is applied to obtain the any two objects' difference between link structure and semantic relationship.The object pairs that have the higher difference values are treated as outlier pairs.Experimental results show that this method can effectively detect outlier pairs in heterogeneous networks.5)In order to detect outlier pairs incrementally,this paper also proposes a tuplebased incremental outlier pair detection method.The 3-tuples,which are used to represent the data in heterogeneous networks,store the target objects and the corresponding link weight.The combination step and the mirror step are proposed to obtain the structure similarity between any two targets.The concepts of the prior node,the descendent node and the coverage rate are defined to not only reduce the number of parameters but also calculate the content-based similarity.The structure-based similarity and the content-based similarity are combined to compute the outlierness score.Finally,we illustrate how to insert and delete 3-tuples to update the object pairs' structure-based similarity,content-based similarity,outlierness score.Updating outlierness score incrementally can reduce time complexity and space complexity effectively.Experimental results show that using tuples to represent the data in heterogeneous networks can update outlierness score dynamically,which improves efficiency greatly.The researches on outlier detection in this paper mainly include two parts.The first part is about the single outlier detection,and the second part is about outlier pair detection.This paper proposes two detection methods for each part: 1)for single outlier detection,this paper delves into meta-path based outlier detection in static networks and tensor representation based outlier detection in dynamic networks,respectively;2)for outlier pair detection,this paper proposes an outlier pair detection method based on link structure and semantic relationship and an incremental outlier pair detection method based on tuples.In addition,in the process of detecting outliers in dynamic networks,an incremental clustering method is proposed.Detecting outliers in heterogeneous networks is a new attempt,and has its theoretic basis and practical significance.
Keywords/Search Tags:Heterogeneous information networks, outlier detection, outlier pair, tensor representation, incremental computation, meta-path, CFu-tree, k-step index
PDF Full Text Request
Related items