Font Size: a A A

Integrated Clustering Algorithms And Applied Research

Posted on:2013-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:B Y LiuFull Text:PDF
GTID:2218330371459723Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Clustering ensemble combines multi-clusterings of a dataset into a single consolidated clustering, and it can be used to generate more accurate and stable clustering result. In recent years, clustering ensemble has become a research focus in machine learning field.Clustering ensemble can be divided into three stages:generating base clusterings, obtaining ensemble relationship and determining final clustering result. To generate base clusterings, different subsets of gene, or other methods could be used; A co-association or cluster-association matrix can be used to obtain the ensemble relationship; Finally, a algorithm like hyper graph partitioning could be used to get the final clustering result.This thesis has researched and implemented three new clustering ensemble algorithms, and has improved every algorithm. The specific work includes the following aspects:(1) This thesis has researched and implemented the algorithm:Fuzzy clustering ensemble based on random projections. A gene re-sampling technology has been used to improve the algorithm of generating base clusterings, and a co-association matrix increased by one has been used to improve the algorithm of obtaining ensemble relationship.(2) The first algorithm needs to assign the number of clusters in the clustering process, this thesis researched and implemented the second algorithm is:Clustering ensemble based on multi-K. This algorithm can generate different number of clusters automatically. We use a co-association matrix increased by K to protrude the compactness and separation between samples, and in the final stage, we re-allocate the stray samples.(3) The aforementioned two ensemble algorithm both use a co-association matrix to get the relationship between samples, they all have ignored the relationship between clusters. LCE utilizes the similarities between clusters to perfect the ensemble relationship. This thesis has researched and implemented the algorithm, and presented a new algorithm for getting the final result:clustering re-ensemble based on cluster-association matrix.(4) Experiment results on real gene expression indicate that:The improved algorithms are better than the corresponding original algorithms; The improved multi-K is better for the case that do not assign the number of clusters; Case in assigning the number of clusters, the improved LCE could get better result.
Keywords/Search Tags:clustering ensemble, base clustering, ensemble relationship, random projection, multi-K, re-ensemble
PDF Full Text Request
Related items