Font Size: a A A

Clustering Ensemble Algorithm Based On Mixed Data Representation

Posted on:2020-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2428330578473735Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is an important research field in data mining,which has been widely used in a lot of fields such as image processing,information retrieval and bioinformatics.So far,a variety of clustering algorithms have been developed.Due to the complexity of the data and the influence of the algorithm parameters,the clustering task cannot be completed by a single algorithm effectively.Therefore,the fusion of multiple clustering results(clustering ensemble)is an important research content of clustering analysis.In the process of clustering ensemble,the effectiveness of the result is affected by the quality of the base clusterings.In this paper,the problem in clustering ensemble algorithm is studied.The main research results are as follows.(1)The base clusterings set are regarded as categorical attributions of the data set,so mixed data consists of the base clusterings set and the data set.A clustering ensemble algorithm based on mixed data representation is proposed.The algorithm is an extension of K-Prototypes algorithm,which can obtain better base clusterings by iterating and updating the base clusterings set.The result of the algorithm keeps two kinds of consensus of the structure of the data set and the base clusterings set.Comparisons with other clustering ensemble algorithms on several UCI data sets illustrate that the result of the proposed algorithm is more effective than other algorithms.(2)On the basis of the first study,a new mixed data representation is proposed by aggregating the feature information of the data set,the information of pairwise constraints and the feature information of the base clusterings set.Based on the new data representation,a new semi-supervised clustering ensemble algorithm is proposed.The algorithm uses NMF clustering algorithm to obtain and update the base clusterings.The result of the algorithm keeps three kinds of high consensus for the feature information of the data set,pairwise constraints and the feature information of the baseclusterings set.The proposed algorithm is tested on UCI data sets.Comparisons with other algorithms illustrate that the proposed algorithm can obtain a higher clustering accuracy.(3)The quality of the base clusterings is evaluated from two aspects:the feature space of the data set and the feature space of the categorical attributions constructed by the base clusterings set.Based on this,a second-weighted clustering ensemble algorithm is proposed.The quality of the base clusterings is characterized by the objective function of clustering algorithm in the feature space of the data set.Furthermore,the W-K-Modes algorithm is used to evaluate the quality of the base clusterings in the feature space of the categorical attributions.The algorithm is compared with other existing clustering ensemble algorithms on several data sets.The experimental results show that the new algorithm can improve the clustering effectiveness of ensemble result.There are two main shortcomings in clustering ensemble process: the result of most existing clustering ensemble algorithms only keeps consensus for the base clusterings set,and the results of most existing clustering ensemble algorithms lack validity evaluation of the base clusterings.In order to overcome these shortcomings,three different clustering ensemble algorithms are proposed in this paper.The proposed clustering ensemble algorithms enrich the clustering analysis technique and provide new technical support for clustering analysis.
Keywords/Search Tags:Cluster analysis, Clustering ensemble, Mixed data, Semi-supervised clustering, Weighted clustering
PDF Full Text Request
Related items