| With the rapid development of Internet technology,massive high-dimensional complex data are produced every minute.How to get valuable information from these gigantic high-dimensional complex data is a frontier problem in data mining and knowledge discovery.The existing clustering analysis method has obtained helpful information from many high-dimensional complex data.Still,there is no satisfactory way to deal with the uncertain relationship between complex data objects and clusters.Therefore,how to improve the existing algorithm to enable it to face the current challenging environment effectively is a topic worth exploring and has fundamental theoretical significance and practical value.Cluster analysis has attracted extensive attention because it does not need the label information of a given sample,only measures the relationship between the data,and then identifies the potential structural features in the data.However,a single clustering algorithm often adopts some idealized data distribution assumption.Clustering ensemble aims to integrate multiple base cluster members to obtain a unified data partition.However,traditional clustering ensemble methods still have many deficiencies in the face of complex data.For example,the known prior information is rarely used,the different contributions of different base cluster members are ignored,and it is difficult to describe the uncertain attribution relationship between objects and clusters in the face of complex data.Therefore,aiming at the problem of traditional cluster analysis,this paper makes indepth research on the combination of semi-supervised learning,integrated learning,and active learning from the perspective of three decision-making.A semi-supervised three-branch clustering ensemble algorithm based on label propagation is proposed to obtain the consistency information of all base cluster members from different base cluster members.Firstly,the label propagation algorithm generates multiple sets of varying base cluster members.Then,the basic cluster member set generated by the label propagation algorithm is reconstructed using the idea of three decision-making.The primary cluster set generated by the new three-way label propagation algorithm is obtained.Considering that the objects in different base clusters are in different regions,a semi-supervised three-branch cluster ensemble model is constructed,which transforms different base cluster members into the same member matrix representation and uses different strategies to integrate the objects in different regions,to better describe the similarity relationship between objects and construct a consistent similarity matrix.The results show that the algorithm proposed in this paper is effective through experiments on several different data sets.Given that the existing clustering ensemble algorithms are difficult to describe the uncertain attribution relationship between objects and clusters in the face of complex data,this topic adopts the representation method of three-way clustering.Three-way clustering presents a cluster through a pair of sets,namely core regions and boundary regions,which can more accurately describe the phenomenon of the fuzzy boundary of clusters and effectively describe the uncertain attribution relationship between objects and clusters.With the advantages of three-way clustering,this paper proposes a semi-supervised threeway clustering ensemble optimization algorithm based on active learning.Active learning to find objects with rich information from unlabeled data and update the consistency similarity matrix after pairwise inquiry can effectively improve clustering performance.At the same time,with the help of three-way of clustering,the search space can be globally limited to the boundary regions,which significantly improves the efficiency of finding objects with rich information.By comparing different experiments on multiple data sets,the results show the effectiveness of the proposed algorithm. |