Outlier Detection In Heterogeneous Information Networks

Posted on:2018-01-16

Degree:Master

Type:Thesis

Country:China

Candidate:M X Yao

Full Text:PDF

GTID:2428330545461184

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

It is a huge challenge to find latent and valuable knowledge from massive data.It is sometimes very important to identify the data that is significantly different from the rest,namely outlier detection.Majority of existing outlier detection algorithms are for high dimensional data,uncertain data,data flow,and time series data.Until recently,there have been an increasing number of outlier detection studies for information networks.Information networks,especially heterogeneous information networks,have complex structure and rich information due to the diversity of vertices and edges,which brings more challenges for outlier detection.In this thesis,we define the concept of outliers with abnormal correlation in heterogeneous information networks,where abnormal correlation manifests in the abnormity of attribute characteristics and connection characteristics of the associated vertices in networks.Then we extend the current query language framework in order to apply to the outlier detection study in this thesis.Based on the correlation,we proposed a new outlier detection algorithm called CBOut.In this algorithm,users have the freedom to determine the type of outliers and the criteria that used to measure whether the vertices are outliers.The CBOut algorithm calculates the similarity matrix of vertices in the network by using the new similarity measure method,and subsequently gets clusters based on the affinity prorogation clustering method.In the end,all vertices within the small-scale clusters are outliers.The experimental results demonstrate that our method can detect outliers proposed in this thesis effectively in synthetic dataset and real dataset.Under single measure criterion and multiple measure criteria,different similarity measure methods are proposed to calculate the similarity matrix of the vertices in the network.In the case of single measure criterion,this thesis proposes a new method to optimize the similarity calculation for multiple queries.This method applies the least frequently used replacement strategy based on the path length to store the eigenvectors of associated vertices selectively.It can reduce the time for similarity calculation in multiple queries when the number of eigenvectors is limited.In real dataset,the experiments show good performance of the optimization algorithm.In the case of multiple measure criteria,the similarity measure method used in the CBOut algorithm needs to assign different preference weights to different measure criteria.Based on domain knowledge,users can specify the preference weights for different measure criteria in the query language.When users cannot explicitly give the preference weights,this thesis also proposes a weight adaptive adjustment mechanism to get the preference weights that are in accordance with the network characteristics.If the setting of preference weights can lead to higher clustering quality,it also can lead outlier detection more precisely.In synthetic dataset and real dataset,the experiments verify that the weight adaptive adjustment mechanism can improve the clustering quality after weights adjustment,and then improve the precision of outlier detection.

Keywords/Search Tags:

Heterogeneous information networks, abnormal correlation, outlier detection

PDF Full Text Request

Related items

1	Research On Outlier Detection Methods In Heterogeneous Networks
2	WSN Gateway Abnormal Detection Based On Spatio-temporal Correlation
3	Abnormal Behavior Detection Algorithm And System For Medicare Date
4	Outlier Detection In Sensor Networks
5	Research On The Methods Of Outlier Detection In Wireless Sensor Networks
6	Outlier Detection Based On Data Correlation
7	Study On Algorithms For Outlier Detection In Wireless Sensor Networks
8	Application Of Outlier Detection In The Abnormal Analysis Of Medical Prescription
9	Research On Parallel Outlier Detection Method In Heterogeneous Distributed Environment
10	Research Of Perceiving The Unusual Action Of Coach Passengers Based On Fusion Of Heterogeneous Information