| With the explosive growth of information in the digital age,data mining technology has become one of the important means of discovering valuable information.Among them,clustering ensemble is a research hotspot in data mining,and the general process includes: 1)generation of base clusters;2)Eventual consistency clustering or basic clustering results fusion.Currently,there are still various difficulties in generating high-quality base clusters and designing appropriate consistency functions in the research of clustering ensemble.This article focuses on the various difficulties in the above two parts and adopts methods such as simplified density clustering,local weighting,and inter class similarity calculation to conduct in-depth research on the two parts.The main content includes:(1)In order to reduce the difficulty of parameter selection and ensure the generation of high-quality base clusters,a local density weighted clustering ensemble method(LWCE)is proposed.Firstly,this article designs a density clustering method(NNDC)to reduce parameter selection and algorithm complexity.Secondly,the similarity matrix generated by NNDC is locally weighted against the similarity matrix results generated by another clustering method(k-means)to obtain the final co-occurrence matrix.Finally,the final clustering and partitioning of the co-occurrence matrix is performed through Normalized Cut(NCUT)to obtain the final clustering results.After experimental verification,this method has better clustering performance compared to other clustering ensemble methods and can stably and effectively identify clusters of any shape.(2)In order to solve the problems that traditional similarity based clustering ensemble methods face when generating a co-occurrence matrix: 1)local noise can easily affect the final clustering results;2)A clustering ensemble method based on inter class similarity(CSCE)was proposed to address the high complexity of time and space.Firstly,an evidence accumulation model is used to find the similarity between the original objects and divide them into multiple small clusters.Then,a new similarity calculation method is used to calculate the similarity between clusters,which forms the similarity matrix between clusters to avoid some problems in the clustering algorithm based on data points,such as the impact of noise and outlier on the clustering results.Finally,the cluster similarity matrix is divided into the final clustering results using the Normalized Cut(NCUT)cut graph method.The experimental results demonstrate that this method has better clustering performance and time efficiency compared to traditional clustering ensemble methods that generate co matrix and join based and graph based clustering ensemble methods.(3)In order to improve experimental and research efficiency,a prototype system of local weighted clustering ensemble algorithm based on similarity was designed and implemented.The prototype system includes functional modules such as data import,construction of similarity matrix,generation of clustering results,and result saving.The system operation results demonstrate the effectiveness of the system,providing convenience for researchers lacking code experience. |