Font Size: a A A

Research On Adaptive Clustering Method Based On The Shape Of Sample Distribution

Posted on:2021-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:M C ChengFull Text:PDF
GTID:1488306728979689Subject:Statistics
Abstract/Summary:PDF Full Text Request
As an important branch of multivariate statistical analysis,cluster analysis is also one of the main methods of unsupervised learning.With the development of artificial intelligence,cluster analysis has received more and more attention in the field of statistics.In this work,we mainly consider different data types,starting from the distribution of samples,constructing judgment criteria that depend on the extraction of sample morphological information,and combining split and fusion techniques to achieve the purpose of adaptive clustering.In many fields such as biology,medicine,sociology,etc.,samples are distributed in a spherical shape in space.The core of clustering in this kind of problems is to determine the position of the center point of multiple spherical aggregation areas.In this work,for the problem of spherical clusters,we first start from the distribution of the sample itself and propose a dynamic K-means algorithm based on the quantile radius.It defines the quantile radius of clusters based on the quantile distance from the sample point to the cluster center.And through the relationship between the center distance and the quantile radius,the clusters with larger overlapping areas are merged,so as to automatically find a certain number of cluster centers as the initial cluster centers.Finally,the initial conditions obtained according to the sample distribution are substituted into the K-means algorithm to give the final clustering results,which solves the problem of artificially given number of clusters and local convergence to a certain extent.Furthermore,we do not only rely on the morphological information in the sample space,but mapping the sample to other spaces to mine more discriminative information.Starting from this idea,we have proposed a split and merge clustering algorithm based on projection.This method contains two key technologies: 1.Shape recognition based on projection;2.Split and merge process based on K-means.Projecting the sample onto the line connecting two cluster centers,we can always obtain a one-dimensional projection regardless of the data dimension,which ensures an acceptable amount of calculation.Combining further with the kernel density estimation,we can estimate the density curve of the projection,and judge the distribution structure of the sample to belong to the same cluster or not according to the number of peaks on the curve.In the process of splitting and merging,the algorithm not only solves the sensitivity problem of initial condition selection,but also automatically gives a reasonable number of clusters.Finally,we also discussed the promotion of this method from K-means to other traditional clustering methods such as EM algorithm and cross-entropy clustering algorithm.Different from spherical clusters,in spatial,image and other data,since the shape of clusters is arbitrary,so the meaning of the center point is no longer clear,which leads to a series of clustering methods based on the center point or model discovery to fail.Therefore,with the help of grid technology,we propose an adaptive-grid based forest-like clustering algorithm.The algorithm uses the sample values on the large variance dimension,and determines the appropriate grid division width through the minimum gap between the peaks and valleys on the density curve,which depends on the sample distribution information to overcomes the subjectivity of grid division to a certain extent.Furthermore,the adjacent search is performed for high-density cells,and the adjacent high-density cells are merged to find clusters of arbitrary shapes by judging the average distance of the sample points between the cells.With the adjacency judgment as a guarantee,the algorithm reduces the dependence of the merge process on the threshold parameters.To sum up,this work takes the sample information of distribution as the motivation and main consideration.The constructed algorithms overcome the main problems in traditional methods and improve the adaptability of algorithms to data.Through experimental comparison,we can also verify the effectiveness and the acceptability of time consumption of the proposed algorithms.
Keywords/Search Tags:Cluster analysis, Shape of sample distribution, Self-adaptation, Spherical, Arbitrary shape
PDF Full Text Request
Related items