In today’s era of data explosion,data has become the most valuable resource and asset.The advent of the Big Data era is not only about the increase in data collection,analysis and processing,but more important is to learn to make full use of data for data analysis and data mining.Outlier detection is a very important technique in data mining,which is used in data cleaning,cluster analysis,information mining and other fields.Outlier detection techniques are used to detect outliers in data and can provide powerful support for analysis,evaluation,interpretation and prediction.In this paper,the traditional outlier detection methods are not effective in detecting outliers in sparse clusters and have low detection accuracy in datasets with complex distribution.The main research contents are as follows.Firstly,it is analyzed that outliers form sparse clusters,resulting in density-based outlier detection algorithms easily treating these outliers as normal points during detection,leading to a high false positive rate.In this paper,the density of data points is calculated by using the method of kernel density estimation,introducing local reachable distance instead of k distance to calculate the local density of data,improving the calculation accuracy of density,and defining the local density ratio based on density for outlier detection in sparse clusters,introducing the density lifting distance,and then combining the local density ratio and density lifting distance to define the outlier factor based on the density lifting distance to obtain the density lifting distance outlier detection algorithm.Secondly,the density-based outlier algorithm is less effective in detecting complex datasets and multidimensional datasets and does not fully consider the local distribution of the data.In this paper,the similarity matrix is constructed by using the similarity function,and the degree of the data is calculated according to the similarity matrix to obtain the diagonal matrix-degree matrix.The data set is pruned through the degree matrix to obtain the candidate outlier set.The local distribution of the data in the candidate outlier set is fully considered,and the local outlier factor based on vector module is proposed.And by combining the pruning and detection strategies,the outlier detection algorithm based on pruning of vector module.Finally,the correctness and robustness of the proposed algorithm are experimentally verified on both artificial and real datasets,showing that the proposed algorithm can detect outliers more effectively and comprehensively than some classical outlier detection algorithms. |