Font Size: a A A

Multi-view Clustering Based Outlier Detection Algorithm

Posted on:2017-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:P YaoFull Text:PDF
GTID:2348330503965501Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Outlier detection is one of the most important tasks of data mining, and its main goal is to find data that significant different from most other data points. These data often contain important information that has high value and wide application area, provide support for a comprehensive understanding of the data. Therefore, in recent years, the outlier detection algorithm is widely used in fraud detection, intrusion detection, ecosystem disorders, public health, medical and other fields.In the era of big data, how to find out the hidden outliers in large and high dimensional data sets become a very challenging task. Traditional outlier detection algorithm, on one hand, such as clustering based outlier detection algorithm and density based outlier detection algorithm, looking for outliers from all attribute space.however, some outliers only show in certain subspaces, the curse of dimensionality hidden these outliers. On the other hand, the traditional outlier detection algorithm interpret the data from a single perspective, however, a data set may containe different mechanisms, which can be interpreted from different perspectives.In order to solve the problem of the curse of dimensionality and single perspective, this paper proposes an outlier detection algorithm based on multi-view clustering. Which on the one hand use a spectral clustering algorithm to ensure high quality of clustering results; on the other hand, through Hilbert-Schmidt independence criterion to ensure that the new clustering result and the known partition model comparison are not redundant. And then we get more accurate outlier sets through the multi-view of the outlier analysis.The results show that the algorithm can improve the accuracy of outlier detection.Specifically, the main work of this paper is as follows:1 Analysis from the perspective of outlier detection, outlier information focuseson the analysis of inundation of the single view existing problems and explainto outliers, multi angle analysis in solving these problems the advantage.2 KDAC using spectral clustering to obtain the clustering result of high quality.The spectral clustering algorithm, clustering analysis, lay a solid foundation forthe discovery of multiple perspectives.3 The introduction of the Schmidt Hilbert independence criteria HSIC, HSIC as anew perspective to discover the evaluation indicators, to ensure that the newperspective and the known pattern is not redundant.4 The integration of multiple candidate outliers from the perspective of the set,get the global outlier case top-N.To validate the HSIC algorithm is effective, in multiple data sets to do the experiments. The experimental results show that this method can improve the effect of outlier detection, the single from the perspective of undetected outliers also has good ability to find.
Keywords/Search Tags:outlier detection, Multi-view, spectral clustering, Hilbert-Schmidt independence criterion, subspace
PDF Full Text Request
Related items