Font Size: a A A

Study On Cluster Analysis And Outlier Detection Based On Natural Neighbor And Density Core

Posted on:2022-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhaFull Text:PDF
GTID:2518306536976169Subject:Engineering
Abstract/Summary:PDF Full Text Request
Cluster analysis and outlier detection are effective methods for labeling categories from massive data without data tags.Based on some similar methods of measurement,cluster analysis helps dividing the data set into several subsets(called cluster),so that the data objects in the cluster have higher similarities,while the data objects between each cluster share lower similarities.Outlier detection is used to mark out outliers which usually shows abnormal data or events.Recently,clustering analysis methods based on the density have become a popular topic for research.The DCore(Density-core-based clustering)algorithm extracts the density core points according to the density of the data objects,and thus classifies the data.However,there are two defects in DCore,first,too many parameters,which are difficult to set,and they are sensitive to the cluttering effect;second,poor effects for data sets with large differences in density.Therefore,to solve the shortcomings of the DCore algorithm,this thesis proposes RDC(Relative density clustering),a density core clustering algorithm,based on its natural neighbors.RDC designs the relative density using the natural neighbor structure to extract the density core,which greatly reduces the parameter settings in density calculation and density core extraction.The extraction of the density core only requires an empirical parameter which has no connection with the data distribution,which is convenient for users to set.The relative density can better preserve the shape of the cluster,so that RDC can adapt to datasets of varying densities.The LOF(Local Outlier Factor)algorithm is a popular density-based outlier detection algorithm.LOF also has two problems: First,the local outlier factor does not integrate the distribution of the nearest neighbors of the data object,which leads to the inability to detect the outliers in a dataset with spherical,complex manifold and varying densities;second,the user needs to set the parameter of the number of outliers,that is,there is a top-n problem.This thesis proposes a weighted-density-based outlier detection algorithm WDOD(Weighted-density-based outlier detection)based on natural neighbors.The weighted density considers the distribution of the natural neighbors of a data object,so that the outlier factor calculation is more accurate,consequently,WDOD can detect outliers in a dataset with complex manifold and varying densities.In addition,WDOD avoids the top-n problem by designing a threshold calculated with the rate of variation of the outlier factor to extract outliers.RDC and WDOD algorithms are compared with classic algorithms in their fields on artificial synthetic datasets and real datasets.Experiments show that RDC and WDOD algorithms are better than other comparison algorithms.
Keywords/Search Tags:Cluster Analysis, Outlier Detection, Density Core, Natural Neighbors
PDF Full Text Request
Related items