Font Size: a A A

The Study And Development Of Hierarchical-K-means-Based Clustering Algorithm

Posted on:2016-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:P LiFull Text:PDF
GTID:2348330542473755Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As a very important part in the field of data mining,Clustering analysis has been widely applied in many fields.Because the mixing clustering algorithm is more effective in the aspect of clustering,Hierarchical K-means-Based Clustering Algorithm(HK clustering algorithm)has become the focus of current researches.This paper studies some existing partitions,such as clustering algorithms,hierarchical method and clustering algorithm based on H-K.Furthermore,it proposes an improved H-K clustering algorithm: HKDE algorithm.The paper has done a deep research and analysis on clustering analysis of domestic and foreign research situation related to the content,the traditional clustering algorithm,distance and similarity measure method,clustering method,hierarchical clustering method,and the traditional H-K clustering algorithm.In view of the low quality and efficiency of the traditional H-K clustering algorithm,and its sensitivity to noise,this paper proposes the concept of distance evaluation function optimizing to the H-K clustering algorithm in the K value,uses the k-d tree data structure for data processing to improve the efficiency of the algorithm,treats the concept of information entropy as the similarity measurement standards to reduce the noise sensitivity of H-K clustering algorithm,and on this basis puts forward an improved H-K clustering algorithm: HKDE algorithm.For the HKDE algorithm needing to add to each cluster in the calculated entropy increment to cause a large amount of calculation problem in current data object hierarchical clustering stage,joined by a distance threshold,is the entropy increment HKDE algorithm in the clustering process only to calculate the current data object to join the cluster and center distance less than or the number of clusters is equal to the distance threshold for judging standard to reduce the current data object to join the cluster computing entropy increment,thus reducing the running time of the algorithm,in order to improve the efficiency of the algorithm.Based on the results of the simulation experiment,this paper has verified the effectiveness of HKDE algorithm,and has made a corresponding comparison between the traditional H-K clustering algorithm and the newly proposed algorithm in clustering quality,efficiency of the algorithm,and the algorithm of multidimensional data processing.The experimental results show that the algorithm proposed in this paper for the HKDE algorithm can be better applied in data clustering.It has a higher clustering accuracy and efficiency,and can deal with multidimensional data better.
Keywords/Search Tags:clustering, hierarchical k-means, distance evaluation function, information entropy clustering, distance threshold
PDF Full Text Request
Related items