Research On Outlier Detection Approach Of High-dimensional Sparse Data Based On Interpolation

Posted on:2021-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:Z Tian

Full Text:PDF

GTID:2428330629988937

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Outliers are objects whose characteristics in a dataset are obviously different from other data,which often contain some important information,and are widely used in financial transactions,intrusion detection and other fields.Because high-dimensional data is often sparse,the outlier detection approach that performs well in low-dimensional data is greatly affected in the high-dimensional data space.Therefore,this thesis draws on the idea of interpolation and discusses outlier detection approaches based on clustering of high-dimensional sparse data.(1)A clustering algorithm(IB k-means)based on interpolation is proposed.Focusing on the sparseness of high-dimensional data,it can improve the clustering effect through sample interpolation.Moreover,the clustering algorithm proposed can effectively support the outlier detection of high-dimensional sparse data.(2)An outlier detection approach for high-dimensional sparse data based on interpolation is proposed.The proposed IB k-means algorithm is used to cluster highdimensional data samples,and then N points farthest from the centroid are determined as outliers.Compared with outlier detection approaches based on traditional and extended kmeans clustering,the proposed ODGA algorithm loses fewer normal points,accurately distinguishes normal and abnormal points,and improves the detection accuracy and precision.(3)A local outlier detection approach based on LOF is proposed.Focusing on outlier detection in high-dimensional dataset with extremely uniform distribution density,the combination of ODGA algorithm and local outlier detection approach(LOF)can not only greatly reduce the calculation cost,save storage spaces,and improve the recall of outliers.Experiments show that using interpolation ideas to improve the clustering effect of high-dimensional data is an ideal choice.Moreover,the approach proposed improves both the precision and recall of outlier detection,and also provides a new perspective for outlier detection for high-dimensional sparse data.

Keywords/Search Tags:

outlier detection, clustering, interpolation, high-dimensional data, genetic algorithm

PDF Full Text Request

Related items

1	A Study On Outlier Detection Algorithms For High Dimensional Data
2	Analysis And Research Of Outlier Detection Algorithm For High Dimensional Data
3	Research On Outlier Detection Algorithm For High Dimensional Big Data
4	Research On Algorithm Of High Dimensional Outlier Detection
5	High-dimensional data mining: Subspace clustering, outlier detection and applications to classification
6	Research And Application On Outlier Detection Algorithm For High-dimensional Data Stream
7	Research On Outlier Detection Algorithm For High-Dimensional Data Based On Angle And Entropy
8	The Researches On Related To Key Technologies Among Clustering Based On High-dimensional Data Space
9	On Sparse AP Clustering Algorithm Based On Outliers Detection
10	Study Of Clustering And Outlier Detection Algorithm In Data Mining