Font Size: a A A

Study On H-K Clustering Algorithms Based On Ensemble Learning

Posted on:2013-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2248330374497711Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important research branch of the data mining, and has been widely used in pattern recognition, information retrieval, machine learning, biological population division fields, and so on, clustering analysis is a very active research subject of the area of data mining.At present, the academia has proposed many clustering algorithms which can be mainly divided into the following several kinds:partitioning method、 hierarchy method、 the method based on density、 the method based on grid and the method based on model, and which can be applied in different fields according to the characteristics of these methods. H-K clustering algorithm(Hierarchical K-means clustering)firstly adopts hierarchical clustering algorithm for the initial division of the dataset, and then adopts K-means clustering algorithm for the further perfect clustering result, so this algorithm can fully express the advantages of both and avoid the disadvantages of both.Along with the practical application of the traditional H-K clustering algorithm is more and more wide, it also highlights some problems, especially in the treatment of the large datasets and high dimensional datasets. This paper will apply PCA (principal component analysis method) and ensemble clustering to improve performance of the traditional H-K clustering algorithm, in order to make the improved algorithms to obtain better performance in clustering the large datasets and the high dimensional datasets, satisfied clustering results. This article mainly complete the following jobs:1.In this paper, PCA method will be introduced to improve the performance of the traditional H-K clustering algorithm, and we propose a new PCAHK clustering algorithm which firstly adopts PCA method to project the high dimensional dataset into lower dimension in space, and then adopt H-K clustering in the dimension reduction dataset and then we will get the final clustering result. The experimental results show that:compared with the traditional H-K clustering algorithm, PCAHK clustering algorithm can effectively get better clustering results, and reduce the time complexity.2.In this paper, ensemble clustering will be introduced to improve the traditional H-K clustering algorithm, we propose a new clustering algorithm named as EPCAHK, in order to obtain satisfying clustering result when our algorithm is adopted in the clustering of the large and high dimensional datasets. The newly proposed clustering algorithm not only adopts ensemble learning but also applies the covariance matrix and the transitive closure in improving the performance of the traditional clustering algorithm, so it combines the advantages of the covariance matrix and the transitive closure, and the experimental results show that:EPCAHK clustering algorithm can get better clustering result compared with the previous similar algorithms.
Keywords/Search Tags:Clustering, Ensemble Learning, Principal Component Analysis(PCA), H-K (Hierarchical K-means Clustering)
PDF Full Text Request
Related items