| Clustering ensemble can obtain better and more robust clustering results by learning multiple base clustering divisions of fusion data sets.In recent years,it has received extensive attention from researchers and has been widely used in image processing,community discovery,and recommendation systems.and text mining and other fields.However,most of the existing clustering algorithms only consider the structural information of the base clusters and ignore the original content information of the data set in the integration process,and only integrate numerical or categorical single-type data,without considering the widespread existence of characteristics of mixed data.For this reason,this paper conducts a series of studies on clustering and ensemble of mixed data,considering both structure and content information in the ensemble process.Specifically,the main research content of this paper is as follows(1)Aiming at the problem of how to effectively use the content information of object attributes and the structure information between objects in the process of ensemble,a ensemble clustering via content and structure algorithm is proposed.Firstly,the content information of attributes between objects is calculated through the extended Euclidean distance;secondly,based on the base clustering information,the co-occurrence relationship between objects is used to obtain the structural information between objects;furthermore,based on the effective fusion of content and structural information,the object and class The ensemble information matrix between them;finally,the final clustering result is obtained by graph clustering the matrix.The validity of the algorithm proposed in this chapter is verified by comparing with the existing algorithms on the UCI dataset.(2)Aiming at the problems of uneven quality and different contributions of base clusters in the ensemble process of clustering ensemble algorithms,Ensemble clustering selection via content and structure information was proposed.Firstly,the mixed data clustering quality evaluation index is given by fusing content and structure information;then,an iterative method is used to select base clusters with better clustering results and strong differences for integration.The comparison and analysis with the existing clustering ensemble selection algorithm on multiple UCI data sets is carried out,and the effectiveness of the proposed algorithm is verified by experiments.(3)This paper designs and develops a clustering ensemble analysis system.The system includes functions such as data import,algorithm introduction,and result display,enabling comparison of results between multiple algorithms on different datasetsThe research results obtained in this paper not only enrich the research content of cluster analysis,but also have important reference significance for the analysis and mining of mixed data,community discovery of attribute network and other related research.. |