Font Size: a A A

Outlier Detection Based On Data Correlation

Posted on:2018-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhaoFull Text:PDF
GTID:2348330512475603Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of information technology and the popularity of Internet,outlier detection is becoming an important problem in data mining.Outlier detection aims to detect abnormal values from the observational data,and has been widely used in various fields,such as network intrusion detection,fraud detection for credit cards or mobile phones,activity monitoring,medical condition monitoring and weather forecasting.Unlike normal data,emergence of outliers is regarded as a random phenomenon,because they do not correspond to the distribution of normal data,and there do not exist the same correlations to normal data between outliers.The correlations between data are composed of the correlations between attributes and the correlations between samples,i.e.,structural correlations.Therefore,the research of how to effectively utilizing the correlation difference between the normal data and outliers will contribute to the identification of outlier.In this thesis,we focus on the study of the correlations between data attributes and structural correlations.The main research is summarized as follows:(1)To solve problems in outlier detection with high-dimensional data and multi-view data,this thesis proposes a novel outlier detection method based on random correlation encoding.The high-dimensional data is divided into multi-view data randomly,then a RCCE space that representing correlations between views and attributes can be obtained by random canonical correlation analysis.On this basis,a discriminative model based on Rayleigh distribution is constructed to characterize the difference in correlation between normal data and outliers.Finally,we make an integration decision through statistical analysis of this difference;(2)In outlier detection,there exist structural correlations between normal data,which do not exist between the outliers.Thus,a novel outlier detection method based on label propagation is proposed in this thesis.The graph model is used to exploit intrinsic structure of data.Moreover,the difference of label confidence between unlabeled positive samples and test samples are described by multiple label propagation.Finally,an integration decision for test samples is given via the statistical properties analysis of label confidence for positive samples;(3)Considering the limitations of existing clustering-based outlier detection methods,we propose a novel outlier detection method based on hypergraph clustering.This algorithm characterizes the structure of data via implementing hypergraph clustering firstly.And then we analyze the correlation in partial structure using a discriminative model with association based on clustering results.Thus the structural correlation's difference between normal data and outliers can be transformed to the difference of degree of association.Finally,the statistics analysis of association's degree is used for detecting outliers.
Keywords/Search Tags:Outlier Detection, Correlation, Multi-view, High Dimension, Label Propagation, Graph Model, Hypergraph
PDF Full Text Request
Related items