Font Size: a A A

Study On Density Kernel Clustering And Outlier Detection Algorithm Based On Skewness

Posted on:2022-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q GaoFull Text:PDF
GTID:2518306536480394Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet and Internet of Things technology,it has greatly facilitated the collection of large-scale data.The structure of the data has become more complex.It is very difficult to label massive and complex data sets.How to avoid the complexity of labels Mining valuable information from data has become the research focus of unsupervised learning.Cluster analysis and outlier detection are two important research directions in the field of unsupervised learning,and they have a wide range of applications,such as face recognition,text segmentation,image processing,network intrusion detection,and credit fraud detection.The concept of density core makes the clustering algorithm show strong superiority in identifying data sets with multiple density levels and complex shapes.DCore is the most representative algorithm among clustering algorithms based on density core.However,the DCore algorithm has the disadvantages of not adapting to data sets with large differences in density levels and difficult to set parameters.In order to solve the problems of the above DCore algorithm,this thesis proposes a skewness-based clustering algorithm with density core,SDC(A skewness-based clustering algorithm with density core).First,combined with the natural neighbor search algorithm,use natural eigenvalue to improve the skewness concept in data statistics and the local density of data points,and then construct the compactness of the data,and filter out the core points by setting appropriate thresholds.Then a minimum spanning tree is constructed for these core points,and each subtree is formed by cutting off the long edges to complete the clustering of the core points,and the remaining points are assigned labels using the principle of proximity.Experiments show that the SDC algorithm proposed in this thesis can handle aspherical,complex manifolds and data sets with large differences in density without setting parameters.For the LOF algorithm,it cannot detect outliers in complex manifolds,straight lines,and data sets with large differences in density levels.This thesis proposes an outlier detection method based on the skew distribution of data objects based on average divergence differences.ADD(A average divergence difference-based outlier detection method with skewed distribution of data objects).First,this thesis proposes the concept of data point dispersion based on natural neighbor theory and data statistical skewness theory.According to the accelerated change of data point dispersion,the concept of average dispersion difference of data points is proposed,and then the data is distinguished by setting thresholds.Whether the point is normal or abnormal.Experiments have shown that the ADD algorithm proposed in this thesis can detect abnormal points in complex manifold data sets,linear data sets,and data sets with large differences in density levels without manually setting the number of neighbors.
Keywords/Search Tags:Clustering, Density Core, Natural Neighbors, Skewness, Outlier Detection
PDF Full Text Request
Related items