Font Size: a A A

Research On Determining Optimal Number Of Clusters In Cluster Analysis

Posted on:2021-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhangFull Text:PDF
GTID:2428330620465665Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a kind of unsupervised learning method,cluster analysis is an important means to obtain information and knowledge from unlabeled data sets,as well as an important research content in the fields of data mining,statistics and pattern recognition.Through effective cluster analysis,the inherent structure and characteristics of datasets can be well discovered.With the development of data mining and artificial intelligence technology,the research on cluster analysis has also developed greatly.At present,cluster analysis has been widely used in customer recommendation,pattern segmentation,video image processing and other different fields.However,there are still many shortcomings in the existing cluster analysis methods.As an important part of cluster analysis,the method of determining the optimal number of clusters is a key factor to determine the quality of clustering.Based on this topic,this dissertation makes a study on the clustering algorithms and clustering validity index in cluster analysis.The main work are summarized as follows:(1)Firstly,the existing clustering methods are still facing the problems of unstable clustering effect and inaccurate clustering effects on different kinds of datasets.In order to solve these problems,a hybrid clustering algorithm Kmeans-AHC is proposed based on the combination of the K-means and AHC(Agglomerative hierarchical clustering)algorithms.This algorithm can cluster data sets with multiple data structures effectively,and it reduces the time complexity compared with the traditional AHC algorithm.(2)Secondly,based on the idea of inflexion point detection,a new clustering validity index,DAS(Difference of Average Synthesis degree),is proposed to evaluate the results of the Kmeans-AHC clustering algorithm.The new DAS index is superior to the current commonly used clustering validity indexes in the evaluation of clustering results.(3)Finally,through the combination of the Kmeans-AHC algorithm the DAS index,an effective method of finding the optimal clustering numbers and optimal partitions of datasets is designed.The experimental results verify the effectiveness of the cluster analysis method proposed in this dissertation.At the end of this dissertation,we summarize the main work and research results of the dissertation,analysis the limitations of the algorithms and indexes proposed in this dissertation,and propose some future improvements and research directions.
Keywords/Search Tags:Cluster Analysis, Clustering Algorithm, Clustering Validity Index, Optimal Clustering Number
PDF Full Text Request
Related items