Font Size: a A A

An Outlier Detection Of Hubness Algorithm Based On Density Deviation

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:H F WangFull Text:PDF
GTID:2428330602473927Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With rapid development of information age,amount of data is increasing by hundreds of millions.In these data,normal data points reflect overall trend of data changing,but abnormal data points may meaning more important information,such as credit card data abnormal may meaning credit card is stolen and so on.Therefore,detection of correct outliers can correctly identify important information of data.Due to variety of data types and variety forms of data generation,there are various reasons for generating abnormal data,which improves difficulty of anomaly detection.Although most of existing anomaly detection algorithms have high performance in low-dimensional data sets,with increasing of data features,performance of anomaly detection gradually declines.To solve this problem,this paper starts from relationship between hubness and anomaly score,and integrates relationship between data features to improve performance of anomaly detection algorithm.At the same time,data distribution characteristics of original data sets should be preserved as much as possible during sampling,a hubness anomaly detection algorithm based on density migration(DDHOD,Density Deviated Hubness Outlier Detection)is proposed.The main contents of this paper including:(1)The reasons of existing anomaly detection algorithms failing in highdimensional data sets are distance parameter losing value and relationship between data features of high-dimensional data.In order to improve detection accuracy of anomaly detection algorithms,this paper integrates relationship between data features.At the same time,most of current sampling strategies are based on uniform sampling,which makes final sampling results couldn't fully show data distribution characteristics of original data sets.Therefore,density migration sampling strategy is adopted in this paper,so that final sampling results can fully show distribution characteristics of original data sets.(2)Most of existing anomaly detection algorithms are difficult to avoid dimensions disaster caused by increase of data features,which have better detection accuracy in low-dimensional data sets,but gradually lose its effectiveness in highdimensional data sets.In order to solve this problem,this paper starts from data characteristics of multiple scale data sets,and uses relationship between hubness and abnormal score of data points for anomaly detection.(3)This paper proposes an anomaly detection algorithm based on density migration sampling DDHOD.The experimental results on four real datasets show that this algorithm in this paper has good detection effect on ROC AUC index and convergence.At the same time,experimental results on synthetic data sets show that accuracy of this algorithm is not affected by different density data sets and different distribution model data sets.
Keywords/Search Tags:outlier detection, hubness, density deviation, matrix decomposition
PDF Full Text Request
Related items