Font Size: a A A

An Automatic Method To Determine The Number Of Clusters Based On Multi-Validity Indices

Posted on:2020-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:N SunFull Text:PDF
GTID:2428330590971761Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is an important unsupervised learning method,which is widely used in transportation,finance,agriculture and medical fields.However,in the process of clustering analysis,how to determine the number of clusters is difficult.The traditional method of introducing clustering validity index is not only tendentious,but also based on hard partition.How to determine the optimal cluster number from multiple perspectives and describe the uncertainty relationship between data objects and clusters is still a problem.Therefore,this thesis proposes an automatic method to determine the number of clusters based on multi-validity indices.In order to explore the number of clusters of a dataset from multiple angles,the thesis proposes a method automaticly to determine the number of clusters based on weighted internal validity index.Firstly,based on the idea of three-way decisions,the thesis devises a three-way clustering algorithm based on the k-means algorithm,which can generate clustering results with different k values.The cluster of three-way clustering is represented by the core region,the boundary region and the trivial region.Then,based on the existing internal validity index,a multi-validity index evaluation system can be constructed.The quality of the clustering results with different k values are evaluated from different angles and the evaluated result is sorted.Finally,according to the optimized optimization strategy,the optimal cluster number is selected,that is,the best cluster number of the comprehensive evaluation result is selected as the optimal k value.Compared with the single validity index before fusion,the thesis validates the effectiveness of the algorithm on UCI datasets including Vowel,Waveform and so on.In order to further improve the performance of the algorithm and make full use of the existing validity index,the thesis draws on the idea of the median partitions in clustering ensemble,and treats each clustering result as the potential optimal clustering result,according to which the external validity index is effectively introduced into the construction of multi-validity evaluation system,and the thesis proposes an method to atomaticly determine the number of clusters based on the internal and external validity index.In addition,this thesis also gives the proof of the validity of the automatic method based on multi-validity indices.At the same time,this thesis also discusses how many validity index to choose and which ones to choose.Through a number of comparative experiments on the UCI dataset,the experimental results show the effectiveness of the proposed algorithm,compared with the single validity index before fusion and the other method to determine the number of clusters.Furthurmore,the thesis studies how to apply the algorithm to the wine field to solve the problem of wine quality identification.
Keywords/Search Tags:cluster analysis, uncertainty data, three-way decisions, clustering validity index, number of clusters
PDF Full Text Request
Related items