Refinement Of Base Clusters For Clustering Ensemble

Posted on:2018-06-19

Degree:Master

Type:Thesis

Country:China

Candidate:K Cheng

Full Text:PDF

GTID:2348330536486033

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Cluster ensemble integrates the multiple partitions of a dataset into a new clustering,which discloses the cluster structure information of all the base clusters to the greatest extent.The qualities of base clusters are obviously crucial to the final ensemble result.K-means is one of the most used algorithms to produce base partitions,as it can be implemented easily and the corresponding computational cost is low,and furthermore,its clustering mechanism conforms to the assumption in machines learning that the class conditional probability of local data is a constant.But K-means usual y adopts Gaussian distance as the distance measure,thus it can only find the clusters of spherical shape.It is also unable to generate high-quality base clusters when applied to datasets with complex structures,especially those whose class structures are not distributed spherically but based on connectivity.Therefore,this paper presents an optimization method for base clusters,namely,to judge the homogeneity of the clusters generated by K-means and partition those with poor homogeneity once again to improve the homogeneity.As a result,the quality of the entire cluster ensemble is improved.The experiments on 8 datasets demonstrate the effectiveness of the proposed method.At the same time,this paper presents a clustering ensemble based on the refined association matrix and can get a more stable and accurate final clustering.This scheme is composed of two layers.In the first layer,multiple K-means applying to the dataset contributes to getting a number of base clusters,generating a refined association matrix after integrating each base cluster independently and iteratively.Compared with the traditional association matrix,this matrix is better on reflecting the internal structure information.In the second layer,by applying refined association matrix,computing intra-class homogeneity and inter-class comparability,guiding the partition and merging of each base cluster,and generating the final clustering.Results of experiments on 8 types of synthetic and real data(from UCI)show promising availability of the proposed approach.

Keywords/Search Tags:

clustering ensemble, K-means, base partitions, homogeneity, spurious Gaussian, inter-class similarity, association matrix

PDF Full Text Request

Related items

1	Clustering Ensemble Method And Application Based On Local Weighting And Inter-class Similarity
2	Design And Implementation Of Clustering Ensemble Algorithm Based On Partition Selection And Weighting
3	Research On Co-association Matrix Based Clustering Ensemble Algorithm
4	Research On Ensemble Clustering Algorithm Based On Improved Co-Association Matrix
5	Research On The Effectiveness Element Theory And Method Of Clustering Ensemble
6	Clustering Ensemble Based On Nonnegative Matrix Factorization
7	Research On Ensemble Clustering Algorithm
8	Clustering Ensemble Based On Densitu Peaks
9	The Research Of Clustering Ensemble Based On Genetic Algorithm And Co-association Matrix
10	Multi-view Ensemble Cluster Analysis Based On Joint Entropy And Negative Evidence And Its Application