The Outlier Detection Algorithm Based On Density Lifting Distance And Vector Module

Posted on:2024-08-08

Degree:Master

Type:Thesis

Country:China

Candidate:X Guo

Full Text:PDF

GTID:2568307151960389

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In today’s era of data explosion,data has become the most valuable resource and asset.The advent of the Big Data era is not only about the increase in data collection,analysis and processing,but more important is to learn to make full use of data for data analysis and data mining.Outlier detection is a very important technique in data mining,which is used in data cleaning,cluster analysis,information mining and other fields.Outlier detection techniques are used to detect outliers in data and can provide powerful support for analysis,evaluation,interpretation and prediction.In this paper,the traditional outlier detection methods are not effective in detecting outliers in sparse clusters and have low detection accuracy in datasets with complex distribution.The main research contents are as follows.Firstly,it is analyzed that outliers form sparse clusters,resulting in density-based outlier detection algorithms easily treating these outliers as normal points during detection,leading to a high false positive rate.In this paper,the density of data points is calculated by using the method of kernel density estimation,introducing local reachable distance instead of k distance to calculate the local density of data,improving the calculation accuracy of density,and defining the local density ratio based on density for outlier detection in sparse clusters,introducing the density lifting distance,and then combining the local density ratio and density lifting distance to define the outlier factor based on the density lifting distance to obtain the density lifting distance outlier detection algorithm.Secondly,the density-based outlier algorithm is less effective in detecting complex datasets and multidimensional datasets and does not fully consider the local distribution of the data.In this paper,the similarity matrix is constructed by using the similarity function,and the degree of the data is calculated according to the similarity matrix to obtain the diagonal matrix-degree matrix.The data set is pruned through the degree matrix to obtain the candidate outlier set.The local distribution of the data in the candidate outlier set is fully considered,and the local outlier factor based on vector module is proposed.And by combining the pruning and detection strategies,the outlier detection algorithm based on pruning of vector module.Finally,the correctness and robustness of the proposed algorithm are experimentally verified on both artificial and real datasets,showing that the proposed algorithm can detect outliers more effectively and comprehensively than some classical outlier detection algorithms.

Keywords/Search Tags:

data mining, outliers, lifting distance, similarity matrix, vector module

PDF Full Text Request

Related items

1	K-distance-based Outliers And Clustering Algorithm
2	Mining Association Rules Among Outliers Based On Histogram And FP-growth
3	Research On Outliers Detection In Data Stream Based On Unsupervised Learning
4	Detection of multivariate mean vector and covariance matrix outliers in behavioral sciences data
5	Research Of Outliers Mining Applied In Snort System Improvement
6	Research On Two Types Of Spectral Clustering Algorithms Based On Sparse Similarity Matrix
7	A Research On Outliers Mining Algorithm Based On Heat Metering Data
8	Research On The Similarity-Based Time Series Data Mining
9	Research On Outliers Mining Method To Web Content
10	Research On Data Mining Clustering Algorithm Based On Improved SC