| Mining valuable information from vast amounts of data with rich structures and characteristics generated in the Internet era has become a hot and difficult area of research in machine learning today.During this period,clustering ensemble techniques have become the focus of research and are used in various industries due to their superior performance and unsupervised nature.Many clustering ensemble techniques are generally oriented towards the design of integration strategies based on the base clustering results,which makes it difficult to obtain better results when the base clustering results are generally poor.This thesis applies the idea of using both sample data and base clustering results in the clustering ensemble process to the proposed model and solves it by a genetic algorithm.The Fast Nondominated Sorting Genetic Algorithm for Multi-Objective Clustering Ensemble(NSGAMCE)is first proposed in this thesis.The model is designed to produce a multi-objective formulation set of consistent objective functions for the sample data and the base clustering results respectively,aiming to produce consensus guidance on both levels of the optimisation objective during the integrated optimisation solution.The model first transforms and defines the clustering ensemble task in the genetic algorithm,then proposes a reduction coding strategy and adaptive variation probability to solve the dimensional catastrophe and local search problems encountered by the genetic algorithm in the clustering ensemble solution process,and finally uses the genetic algorithm to solve the objective formula set.In this thesis,we also propose a Pairwise Constraints Guide Fast Nondominated Sorting Genetic Algorithm for Multi-Objective Clustering Ensemble Algorithm(pc NSGAMCE),which adheres to the idea of fusing sample data with base clustering results.Finally,the pairwise constraint information is incorporated into the clustering ensemble multi-objective formula set to guide the genetic algorithm to iteratively solve the problem.Finally,the two algorithms proposed in the thesis are experimented on public datasets.During the experiments,the classical and frontier clustering ensemble algorithms are selected for comparison,and the results are evaluated using accuracy,purity and normalized mutual information as evaluation metrics.The experimental results show that the two algorithms proposed in this thesis outperform other algorithms. |