Font Size: a A A

Research On Ensemble Clustering Algorithms For Complex Data

Posted on:2018-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ShiFull Text:PDF
GTID:2348330521451616Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cluster ensembles have emerged a powerful clustering analysis technology and caught high attention of researchers due to their good generalization ability.Researchers have recently made some significant achievements in the field of ensemble clustering,and different ensemble clustering algorithms are proposed for different application requirements.However,the existing algorithms generate the final results for simple data,real life data sets are usually mixed data described by numerical and categorical attributes at the same time,simultaneously with the problem of missing value,massive data and multiple sources.Therefore,it is meaningful to develop the research of ensemble clustering algorithm for complex data.This paper conducted deep research aiming at ensemble clustering for complex data.The main content is as follows:(1)We introduced the whole process of ensemble clustering technology,summed up and analyzed some existing typical algorithms for dealing with two most important problems in the process,and also introduced three popular criteria to evaluate the effectiveness of the clustering algorithms.(2)We proposed an ensemble clustering algorithm for incomplete mixed data.Firstly,the algorithm conducts completion of incomplete mixed data using three different missing value filling methods.Then,a set of clustering solutions are produced by executing K-Prototypes clustering algorithm on three different kinds of complete data sets multiple times,respectively.Next,a similarity matrix is constructed by considering all the clustering solutions.After that,the final clustering result is obtained by hierarchical clustering algorithms based on the similarity matrix.The effectiveness of the proposed algorithm is empirically demonstrated over some UCI real data sets.The experimental results show that the proposed algorithm is able to generate higher clustering quality in comparison to several traditional clustering algorithms.(3)We proposed an ensemble clustering algorithm for multi-source data.The proposed algorithm is very scalable and runs efficiently by working on cluster level in the ensemble stage,and defines a new similarity measure for clusters due to different feature spaces of multi-source data.The effectiveness and efficiency of the proposed algorithm is empirically demonstrated over some real multi-source data sets.The experimental results show the superiority of our proposed algorithm in comparison to several traditional ensemble clustering algorithms.The proposed two algorithms in this paper take into account the importance of both clustering accuracy and execution time,and effectively solve the problems of ensemble clustering for complex data which extensively exists in the practical application.The research results provide ensemble clustering for complex data with new strategies,and further enrich the research of clustering analysis for complex data.
Keywords/Search Tags:Ensemble clustering, Incomplete data, Mixed data, Multi-source data, K-Prototypes clustering algorithm
PDF Full Text Request
Related items