Font Size: a A A

Research On Determining Optimal Number Of Clusters In Cluster Analysis

Posted on:2019-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330566499461Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
As an important analysis method in the field of data mining and machine learning,cluster analysis has been studied by many experts and scholars in recent decades.Now,with the development of Internet,a variety of data sources are emerging,which leads to the rapid development of cluster analysis methods and a lot of achievements have been achieved.However,there are still many problems in clustering analysis,one of which is the determination of optimal number of clusters.In order to solve this problem,this dissertation studies the cluster analysis and clustering validity index.A new clustering validity index is proposed,and the K-means algorithm is improved which is applies to the practical problem of Chinese news text clustering.The main research results of this paper are as follows:1.An index based on generalization ability which is named as GA index is proposed.The index is used to measure clustering validity through the generalization ability of current clustering results.By logical reasoning and experimental data,it shows that the index can achieve better quality evaluation for clustering results.2.In combination with the GA index,a method of determining optimal number of K-means clustering numbers is proposed,which is named as KGA algorithm.Thus the problem that K-means algorithm needs to determine the number of clusters in advance is solved.Through the test on artificial data sets and real data sets,this method can effectively determine optimal clustering number of K-means clustering.3.Based on the GA index and the KGA algorithm,a Chinese news text clustering framework is designed.The framework improves the K-means algorithm through GA index and KGA algorithm,and applies the improved K-means algorithm to the practical problem of Chinese news text clustering.By experimentation on 1800 news data,the practicability and effectiveness of the framework have been verified.Finally,the work and results of this dissertation are summarized,and the future research work is prospected.
Keywords/Search Tags:cluster analysis, clustering validity index, generalization ability, optimal number of clusters, K-means clustering, text clustering
PDF Full Text Request
Related items