Clustering Ensemble Algorithm Based On Mixed Data Representation

Posted on:2020-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:X Li

Full Text:PDF

GTID:2428330578473735

Subject:Computer application technology

Abstract/Summary:

Cluster analysis is an important research field in data mining,which has been widely used in a lot of fields such as image processing,information retrieval and bioinformatics.So far,a variety of clustering algorithms have been developed.Due to the complexity of the data and the influence of the algorithm parameters,the clustering task cannot be completed by a single algorithm effectively.Therefore,the fusion of multiple clustering results(clustering ensemble)is an important research content of clustering analysis.In the process of clustering ensemble,the effectiveness of the result is affected by the quality of the base clusterings.In this paper,the problem in clustering ensemble algorithm is studied.The main research results are as follows.(1)The base clusterings set are regarded as categorical attributions of the data set,so mixed data consists of the base clusterings set and the data set.A clustering ensemble algorithm based on mixed data representation is proposed.The algorithm is an extension of K-Prototypes algorithm,which can obtain better base clusterings by iterating and updating the base clusterings set.The result of the algorithm keeps two kinds of consensus of the structure of the data set and the base clusterings set.Comparisons with other clustering ensemble algorithms on several UCI data sets illustrate that the result of the proposed algorithm is more effective than other algorithms.(2)On the basis of the first study,a new mixed data representation is proposed by aggregating the feature information of the data set,the information of pairwise constraints and the feature information of the base clusterings set.Based on the new data representation,a new semi-supervised clustering ensemble algorithm is proposed.The algorithm uses NMF clustering algorithm to obtain and update the base clusterings.The result of the algorithm keeps three kinds of high consensus for the feature information of the data set,pairwise constraints and the feature information of the baseclusterings set.The proposed algorithm is tested on UCI data sets.Comparisons with other algorithms illustrate that the proposed algorithm can obtain a higher clustering accuracy.(3)The quality of the base clusterings is evaluated from two aspects:the feature space of the data set and the feature space of the categorical attributions constructed by the base clusterings set.Based on this,a second-weighted clustering ensemble algorithm is proposed.The quality of the base clusterings is characterized by the objective function of clustering algorithm in the feature space of the data set.Furthermore,the W-K-Modes algorithm is used to evaluate the quality of the base clusterings in the feature space of the categorical attributions.The algorithm is compared with other existing clustering ensemble algorithms on several data sets.The experimental results show that the new algorithm can improve the clustering effectiveness of ensemble result.There are two main shortcomings in clustering ensemble process: the result of most existing clustering ensemble algorithms only keeps consensus for the base clusterings set,and the results of most existing clustering ensemble algorithms lack validity evaluation of the base clusterings.In order to overcome these shortcomings,three different clustering ensemble algorithms are proposed in this paper.The proposed clustering ensemble algorithms enrich the clustering analysis technique and provide new technical support for clustering analysis.

Keywords/Search Tags:

Cluster analysis, Clustering ensemble, Mixed data, Semi-supervised clustering, Weighted clustering

Related items

1	Research On Semi-supervised Classification Algorithm Based On Clustering Ensemble
2	Research On Semi-supervised Selective Clustering Ensemble
3	Semi Supervised Clustering Algorithm And Its Application And Research
4	Research On Clustering Ensemble And Semi-Supervised Clustering In Data Mining
5	Research On Theory And Technology Of Semi-Supervised Clustering Ensemble
6	Research On Semi-Supervised Clustering Ensemble Model
7	Research On Key Technologies Of Clustering Ensemble
8	Research On The Effectiveness Element Theory And Method Of Clustering Ensemble
9	Adaptive Semi-supervised Clustering Ensemble For High Dimensional Data
10	Research Of Semi-supervised Clustering Ensemble