Font Size: a A A

The Research Of The Clustering Ensembles Based On SEAM Algorithm And It's Application On Text

Posted on:2010-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:L L YangFull Text:PDF
GTID:2178360278952265Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the network and database technology, data mining is produced. Clustering is one of the important tasks in the field of data mining and is an important method of data partition or grouping. Clustering has been studied in many research areas including pattern recognition, machine learning, statistical learning, data mining, text mining, and has been used in various application domains such as in commerce, market analysis, biology, Web, classification and so on.Although some of the clustering algorithms have been applied widely, it is hard for people to find the suitable clustering algorithm for a given dataset, because there are many restricts on those data set for clustering. So clustering ensembles emerged as the times required. The problem of clustering ensembles is to combine multiple partitions of a set of objects without accessing the original features. Experimental results proved that through clustering ensembles we can get better result than single clustering algorithm. But this method is far from mature, such as the choice of some key parameters, the design of the consensus functions, and so on. The main works in this paper are described as follows:1. On the basis of having studied clustering ensembles thoroughly, a new method of clustering ensembles is designed in this paper, the clustering ensembles based on the SEAM (Squared Error Adjacent Matrix) algorithm, called ESEAM (Ensemble method based on SEAM) algorithm for short. Given multiple partitions of a dataset, a similarity matrix is generated by measuring the co-occurred times of pairs of data objects in the same cluster. Then the SEAM algorithm is applied on this similarity matrix to get the final data partition. Experimental results show that this method is effective.2. The text clustering, one of the application fields of cluster analysis is reviewed. The ensemble method proposed in this paper is applied on text dataset. The experimental results are discussed.Clustering ensembles can combine the results of many individual clustering algorithms, and given the similarity matrix the SEAM algorithm can find the final partitions automatically without predefining the number of clusters or other initialized parameters. So the clustering ensembles algorithm based on the SEAM algorithm, the ESEAM algorithm, can avoid the problems of choosing the proper clustering algorithm and the number of clusters for a given dataset.
Keywords/Search Tags:Data mining, Clustering algorithm, Clustering ensembles, Consensus function, Squared Error Adjacent Matrix (SEAM) algorithm, Text dataset
PDF Full Text Request
Related items