Font Size: a A A

Dynamic Group Nearest Neighbors And Its Applications In Outlier Detection And Clustering Analysis

Posted on:2021-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:R M WangFull Text:PDF
GTID:1488306464458124Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As two important branches of data mining,outlier detection and clustering analysis are under vigorous development at present,despite they have been studied extensively for many years.Nearest neighbors searching methods play an important fundamental role in outlier detection and clustering tasks,especially for local outlier detection and density-based clustering algorithms.However,traditional nearest neighbors searching methods,such as k NN(k-nearest neighbors)and Rk NN(Reverse k-nearest neighbors),are not feasible for quantifying neighborhoods for data of arbitrary shapes due to their simple but rigid searching ways,especially for non-spherical-shaped data.Therefore,the outlier detection and clustering algorithms that uses the traditional nearest neighbors searching methods usually have the problems of adaptability and parameter setting,especially for the data with complex shapes.To solve the problems mentioned above,this thesis presents a new nearest neighbors searching method and develops new outlier detection and clustering algorithms based on the new searching method.Aim to resolve the parameter setting and adaptability problems of traditional neighborhood quantitative methods,a new definition of nearest neighbors,named dynamic group nearest neighbors,has been proposed first.Unlike the existing nearest neighbors searching methods,the dynamic group nearest neighbors searching method uses dynamic data records instead of a single and fixed data point as references when it searches neighbors.Compared with traditional nearest neighbors searching methods,neighborhood quantified with the new strategy is more robustness,adaptable,and insensitive to parameter without any assumption.In light of the above advantages of dynamic group nearest neighbors,a new local outlier detection method and an improved density peaks clustering algorithm are proposed,respectively.The proposed local outlier detection method introduces dynamic group nearest neighbors to measure its neighborhoods,and define a new local anomalous degree to score each local region instead of each data record.Finally, outliers can be labeled easily according to the basic features of dynamic group nearest neighbors.The new local outlier detection approach is feasible for data of arbitrary shapes and can detect both single outlier and multiple outliers.Moreover,it is robust over a wide range of neighborhood parameter.Meanwhile,a hybrid local outer detection algorithm for large-scale datasets has been presented by combining the proposed local outlier detection and K-means.The hybrid detection method is suitable for large-scale datasets without significantly sacrificing detection accuracy,and insensitive to partitions of K-means.In clustering analysis,the proposed density peaks clustering algorithm only needs to deal with dynamic group nearest neighbors representatives whose size is much smaller than original data,in the case that the basic characteristics of the original data are well reflected.It also redefines three concepts by using dynamic group nearest neighbors: local density,relative distance,and decision rule,which are the core of density peaks clustering.The experiments show that the problem of fake cluster centers is significantly alleviated in the proposed density peaks clustering algorithm,especially for datasets with clusters of complex shapes and distributions.In addition,the parameter sensitivity problem of them is also greatly improved.In data mining,large-scale clustering is an active topic and great efforts have focused on developing algorithms for efficient and effective cluster analysis in large-scale datasets.A fast and non-parametric clustering algorithm,on the basis of representatives and local density,has been proposed in this thesis.First,the sub-cluster centers partitioned by K-means are regarded as representatives.And then a sparse graph of the representatives is constructed through dynamic group nearest neighbors.As the core of the large-scale clustering approach,a novel three-point decision is designed for separating the sparse graph of representatives and labeling data.The new large-scale clustering approach requires linear time and space complexity and does not require any user-provided parameter.The effectiveness of the proposed algorithms mentioned above is demonstrated by extensive experiments on both synthetic and real-world datasets.
Keywords/Search Tags:local outlier detection, clustering, dynamic group nearest neighbors, large-scale datasets
PDF Full Text Request
Related items