| Nowadays,the direction of clustering integration has become a key extension of the classical clustering problem field.It provides a clever framework to deal with the inconsistent parts of the base clusterings.Although cluster integration methods have achieved good performance,they are not adaptable enough to deal with the dataset containing samples with negative effects.Most of the existing clustering ensemble methods focus on the weighting or screening of basic cluster members and cluster members,and then use all samples to learn consensus clustering results,without fully considering the side effects of some difficult to cluster and unreliable samples on clustering results.However,different samples are known to play different roles in discovering the underlying data structure in the three-branch clustering.Therefore,samples are divided from different perspectives in this paper to solve the defect that the existing scheme treats all samples as the same.In addition,different methods are used to improve the accuracy of sample division,so as to improve the clustering performance.Specific research programs are as follows:(1)This article proposes a sample reliability based selection clustering ensemble algorithm(CESD),which calculates the reliability of samples by using a co incidence matrix based on quotient space theory.Subsequently,a three branch selection algorithm for basic cluster members based on set distance measurement was used to exclude the impact of low-quality basic cluster members on sample partitioning.Finally,use the clustering structure obtained from reliable samples to allocate unreliable samples.This article conducts qualitative comparative experiments with classical clustering ensemble algorithms and the latest clustering ensemble schemes based on sample processing,proving that this study can provide a new research approach and theoretical basis for clustering ensemble methods based on sample partitioning,and has certain research value.(2)This paper also proposes a sample reliability clustering ensemble algorithm based on information entropy(CESD-I),which is an improvement of CESD algorithm.The information entropy is used to calculate the contribution of samples to the uncertainty of basic cluster members.The contribution of reliable samples to the uncertainty of basic cluster members is relatively small,while the contribution of unreliable samples to the uncertainty of basic cluster members is relatively large.Subsequently,a basic cluster member quality measurement method based on cluster consistency and an iterative screening method based on quality and diversity were used to alleviate the negative impact of obtaining multiple similar basic cluster members due to quality screening in the CESD algorithm on sample partitioning.Finally,qualitative comparative experiments were conducted with CESD algorithm and other clustering ensemble algorithms based on sample processing,and the rationality of this method was analyzed based on the experimental results. |