With the rapid development of information technology,it becomes easier and easier for people to obtain data.Due to the problems of roughness,ambiguity and uncertainty in the data itself,it is more difficult to find useful knowledge information in the massive data with complex structure and dynamic changes.But Clustering ensemble is one of the popular methods in recent years in the field of data mining to discover hidden information in unlabeled datasets.It draws on the idea of ensemble learning.It first generates a base clustering set by using a variety of different types of single clustering algorithms or changing the initial parameters of the same clustering algorithm,then uses a fusion function to obtain more robust and effective results than a single clustering algorithm.The research shows that selecting the base clusters with differences and high quality among the base clusters to participate in the integration is an important way to improve the effect of the final integrated clustering.However,the existing base clustering optimization methods often only take into account the differences of base clusters or only study the quality optimization of base clusters.Therefore,from the perspective of the quality of the base clusters and the diversity among the base clusters,this paper proposes a quality measurement method based on the base cluster and a measurement method of the diversity among the base clusters,and proposes two different clustering ensemble algorithms.The specific research contents are as follows:(1)From the perspective of taking into account the quality and diversity of base clusters,this paper first proposes a measurement method that combines the diversity and quality of base clusters,in which the consistency measurement method is used for the diversity among clusters;The quality of clustering is mainly measured by calculating the separation of clusters and the tightness of samples in clusters.Secondly,a weighted associative matrix based on sample similarity and a three branch decision clustering ensemble algorithm based on quality and diversity(3W_CBQD)are proposed.Finally,through the comparative experiments with traditional clustering algorithms and some common clustering ensemble methods,it is proved that the model proposed in this paper can effectively improve the quality of the final clustering results and has good scalability.(2)This paper proposes a clustering ensemble algorithm based on mutual information.Firstly,the second measurement method combining the diversity and quality of base clusters is proposed.The diversity of base clusters uses the deformed mutual information measurement method;The quality of base clustering is measured by mutual information.Secondly,a new base clustering iterative selection optimization algorithm based on mutual information(CSOM)is proposed.Finally,compared with the 3W_CBQD algorithm and other clustering integration algorithms,it is proved that the algorithm has a significant effect.The two algorithms proposed above are from the perspective of the quality and diversity of the base clustering,and put forward a variety of base clustering measurement methods.Experiments show that the clustering ensemble algorithms proposed in this paper have good results. |