Font Size: a A A

Research On Outlier Detection Approach Of High-dimensional Sparse Data Based On Interpolation

Posted on:2021-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z TianFull Text:PDF
GTID:2428330629988937Subject:Engineering
Abstract/Summary:PDF Full Text Request
Outliers are objects whose characteristics in a dataset are obviously different from other data,which often contain some important information,and are widely used in financial transactions,intrusion detection and other fields.Because high-dimensional data is often sparse,the outlier detection approach that performs well in low-dimensional data is greatly affected in the high-dimensional data space.Therefore,this thesis draws on the idea of interpolation and discusses outlier detection approaches based on clustering of high-dimensional sparse data.(1)A clustering algorithm(IB k-means)based on interpolation is proposed.Focusing on the sparseness of high-dimensional data,it can improve the clustering effect through sample interpolation.Moreover,the clustering algorithm proposed can effectively support the outlier detection of high-dimensional sparse data.(2)An outlier detection approach for high-dimensional sparse data based on interpolation is proposed.The proposed IB k-means algorithm is used to cluster highdimensional data samples,and then N points farthest from the centroid are determined as outliers.Compared with outlier detection approaches based on traditional and extended kmeans clustering,the proposed ODGA algorithm loses fewer normal points,accurately distinguishes normal and abnormal points,and improves the detection accuracy and precision.(3)A local outlier detection approach based on LOF is proposed.Focusing on outlier detection in high-dimensional dataset with extremely uniform distribution density,the combination of ODGA algorithm and local outlier detection approach(LOF)can not only greatly reduce the calculation cost,save storage spaces,and improve the recall of outliers.Experiments show that using interpolation ideas to improve the clustering effect of high-dimensional data is an ideal choice.Moreover,the approach proposed improves both the precision and recall of outlier detection,and also provides a new perspective for outlier detection for high-dimensional sparse data.
Keywords/Search Tags:outlier detection, clustering, interpolation, high-dimensional data, genetic algorithm
PDF Full Text Request
Related items