Font Size: a A A

Research On Clustering Algorithm Based On Cluster Center Selection

Posted on:2021-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:L L SunFull Text:PDF
GTID:2428330602973786Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering is the process of placing data objects with similar characteristics in the same cluster and those with different characteristics in different clusters.It plays an important role in analyzing the characteristics and internal structure of the data.Nowadays,Clustering is widely used in financial industry,information security,biological gene sequence,image processing,customer segmentation and other fields.Among the clustering technologies,the research of clustering technology based on cluster centers is the most widely studied and it is one of the hot topics in the research.How to accurately extract cluster centers is an urgent problem.The solution of this problem is important to improve the quality of clustering.This paper focuses on the problems of inaccurate calculation of the number of cluster centers in the clustering algorithm,sensitive parameters,and low accuracy of the clustering algorithm in datasets with unbalanced density.Based on the principle of cluster centers selection,these problems are studied.The contributions of this thesis are described as follows:(1)A density clustering algorithm based on the dynamic selection of cluster center(CCDS)is proposed.First,calculating the local density of all data objects in the datasets.In order to calculate more accurately and reduce the sensitivity of parameters,the cutoff distance adaptive computing mechanism combined with the k-nearest neighbor idea is introduced into the calculation of local density.At the same time,a dynamic cluster center selection mechanism is added to select the appropriate cluster centers.Finally,the remaining objects are divided into higher density clusters according to the nearest distance principle to form clusters.This algorithm solves the problem of parameter sensitivity and avoids the selection bias caused by manual selection when selecting the cluster centers.In order to verify the effectiveness of the algorithm,this paper selects artificial datasets and UCI datasets of different dimensions and sizes,both of which verify the effectiveness of the algorithm.(2)Clustering algorithm based on two-stage objects partition strategy(CA-TSP)is proposed.In the existing clustering algorithms,if there is a deviation in the density measurement,the selection of cluster centers will fail,which will cause the objects partition based on cluster centers to be divided incorrectly.And it affect the clustering effect.Therefore,this paper proposes a two-stage objects partition strategy.In the first stage,the weighted divergence value is proposed based on the concept of Kullback-Leibler divergence in statistics.It removes the interference of non-core objects and uses the object recursive traversal method to establish the nearest neighbor relationship.The obtained core objects are divided by the nearest neighbor relationship.The second stage of division uses the density weight to obtain cluster centers from the result of the first stage.The non-core objects are divided according to the minimum distance the cluster centers.After all the objects are divided,the clustering task is completed.The validity of CA-TSP has been verified on both the synthetic datasets and the UCI datasets.Compared with similar algorithms,this algorithm has higher clustering accuracy.From low-dimensional to high-dimensional,from a small amount of data to a large amount of data,from uniform density data to complex distributed data,the research work in this paper provides a new idea for the research of clustering algorithms based on cluster centers.Through the experimental analysis and research in the fields of medical treatment,finance,biology,etc.,the research of clustering algorithm based on cluster center is accelerated from theoretical research to practical application.
Keywords/Search Tags:clustering algorithm, cluster centers, weighted divergence value, multi-density, high-dimensional
PDF Full Text Request
Related items