Font Size: a A A

Research On The Effectiveness Element Theory And Method Of Clustering Ensemble

Posted on:2021-04-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:F J LiFull Text:PDF
GTID:1368330620463277Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,data is becoming an important resource in today's society.On the one hand,the developments of sensor technology and storage technology have accumulated a large amount of data in many fields;on the other hand,the developments of computing power and data intelligence processing technology provide technical support for data processing.Clustering analysis is a key technology for obtaining data value,which mainly handles the ubiquitous unlabeled data,and provides effective data preprocessing methods for many data processing technologies.The researchers have proposed a large number of clustering models and algorithms.However,most clustering algorithms are only suitable for specific scenarios and specific assumptions.The current complexity of data environment brings a huge challenge for the effectiveness,robustness,and stability of a single clustering method.The clustering ensemble technology that integrates multiple heterogeneous clustering results is an important strategy to address the above challenges.In addition,the flexible process of clustering ensemble also broadens the application area of data clustering.In view of this,clustering ensemble has received extensive attention from researchers and has achieved many research results.Due to the lack of supervision information,there is no systematic theoretical support for the effectiveness of clustering ensemble.The elements closely related to the effectiveness of clustering ensemble are still unclear.These limit the performance of clustering ensemble algorithms and hinder the depth of researches about clustering ensemble.Therefore,the research on the effectiveness element theory and method of clustering ensemble has important theoretical innovation significance and practical application value.According to the general process of cluster ensemble,the thesis reveal the relationship between five elements and the effectiveness of clustering ensemble.The effectiveness of clustering ensemble mainly refers to the accuracy of clustering ensemble result.The five elements are the base clustering set element,cluster quality element,data characteristic element,relationship expression element,and fusion strategy element.Then,To further improve the generalization ability of clustering ensemble,the thesis propose the algorithms that consider the elements and analysis the influence they bring to the performance of clustering ensemble.Finally,a normal form for the clustering ensemble effectiveness is formulated as A = f(B,C,D,E,F)? G.The main research contents and research results of the paper are as follows:(1)A space structure-based categorical data clustering algorithm and a space structure-based mixed-type data clustering algorithm have been proposed.Aiming at the characteristics of unclear space structure of categorical data and mixed-type data,a space structure representation scheme is proposed.Experimental analysis has shown that,on the basis of effectively maintaining the distribution of the data in the original space,the space structure representation scheme can provide richer measurement information than the original space.Based on the space structure representation scheme,a categorical data clustering algorithm and a mixed-type data clustering algorithm have been proposed.The proposed algorithm utilizes an efficient numerical data clustering algorithm to obtain the final clustering result based on the space structure representation of the data.Experimental analysis has shown that,comparing with representative methods,the proposed algorithms obtain a significant clustering performance improvement under comparable time consumption.(2)The inherent relation between the cluster quality element and the effectiveness of clustering ensemble in the ensemble selection stage is revealed.The evaluation degree has been refined to the cluster level in the ensemble selection stage.A cluster quality measure has been proposed based on set match degree,which is known as SME.Theoretical analysis has shown that SME can effectively handle the internal consistency failure problem and external inconsistency failure problem of the existing measures.Experimental analysis has shown that weighting clusters with SME can bring obvious performance improvement of clustering ensemble.In addition,a novel selective clustering ensemble framework is proposed,which is known as DSME.This framework considers the differences between the objective of the ensemble selection stage and the objective of the ensemble integration stage.In the ensemble selection stage,the diversity is mainly considered,while in the ensemble integration stage,the accuracy is mainly considered.Experimental analysis has shown that,compared with other representative selective clustering ensemble methods,the DSME embedded with SME improves the performance of clustering ensemble more significantly.(3)The inherent relation between the data point characteristic element and the effectiveness of clustering ensemble in the ensemble integration stage has been revealed.The relation is studied from the perspective of the sample's stability,which is defined as the average degree of determinacy of the relationship between a sample and other samples.Sample's stability can be used to reflect the contribution of a sample to correctly construct the group structure,and provides a metric for differentiating samples in clustering ensemble.Theoretical analysis has shown the rationality of the definition of sample's stability.Experimental analysis on the image segmentation case has visually demonstrated the rationality of the sample's stability.Further,a clustering ensemble algorithm based on sample's stability has been proposed,which applies targeted strategies to handle the stable sample sets and unstable sample sets,respectively.The experiment on the artificial data sets has visually shown the working mechanism of the proposed algorithm.The experiment on the benchmark data sets has verified the effectiveness and robustness of the proposed algorithm.(4)The inherent relation between the relationship expression element and the effectiveness of clustering ensemble in the ensemble integration stage has been revealed.Firstly,two limitations of the co-association relationship expression matrix have been pointed out,which are high sparsity and low-value density.Aiming at the high sparsity limitation,the shortest path technology has been employed to reconstructs the relationship expression matrix.It has been theoretically proven that the reconstructed relationship expression matrix can find more reasonable prototype samples.This conclusion has been visually verified by experimental analysis on two-dimensional artificial data sets.In order to deal with the limitation of low-value density,a growing tree model has been proposed.This model utilizes the large margin theory to measure the confidence level that a sample can be correctly classified,and then preferentially handles the samples with high confidence level.The experiments on artificial data has visually shown the working mechanism of the growing tree model.The experimental analyses on benchmark data sets and image data sets have illustrated the effectiveness of the proposed model.(5)The inherent relation between the fusion strategy element and the effectiveness of clustering ensemble in the ensemble integration stage has been revealed.A Dempster-Shafer evidence theory-based fusion strategy has been proposed,which considers the data distribution information during the ensemble integration stage.It has been theoretically proved that the proposed fusion strategy can obtain correct integration results when the basis clustering is superior to random partition.In addition,for the binary clustering problem,the proposed fusion strategy has been proved to be superior to the voting strategy.A Dempster-Shafer evidence theory-based clustering ensemble method has been proposed.Experimental analysis has shown that the performance of clustering ensemble has been significantly improved after drawing into the proposed fusion strategy.The above research results provide a systematic theoretical support for the effectiveness of clustering ensemble,provide new ideas for the study of clustering ensemble theory,and provide guiding ideas for the design of clustering ensemble algorithms.The research of the thesis enriches the research on clustering analysis and provides technical support for data analysis in complex data environments.
Keywords/Search Tags:Clustering Ensemble, Clustering Analysis, Selective Clustering Ensemble, Clustering Ensemble Effectiveness, Cluster Quality Evaluation, Co-association Matrix, Sample's Stability, Fusion Strategy
PDF Full Text Request
Related items