Font Size: a A A

Research On Ensemble Clustering Algorithm

Posted on:2022-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:H Y XueFull Text:PDF
GTID:2518306527483024Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering algorithm,as an important research content in the field of pattern recognition and data mining,is widely concerned by researchers.Many clustering algorithms have been proposed in recent years which show good performance on clustered datasets.However,a single clustering algorithm is difficult to obtain good clustering results in the face of datasets with complex structure,indistinguishable boundary,non-spherical distribution or high-dimensional data.Ensemble clustering algorithm can solve the above problems.This paper studies the ensemble clustering algorithm from many aspects and improves the ensemble clustering algorithm.The specific research contents are as follows:1.An ensemble clustering algorithm based on weighted super-cluster was proposed.Most ensemble clustering algorithms use the K-means algorithm to generate base clustering and get poor base clustering.When the co-association matrix is used to ensemble the base clustering,the diversity of the base clustering is ignored and the base clustering are treated equally.In addition,when the number of samples or the size of ensemble clustering is large,and the sample is used as the operation unit to generate the co-association matrix,the algorithm efficiency will be reduced.A super-cluster weighted ensemble clustering method based on landmark sampling can solve the above problems.The method consists of three steps.The first step is to use a combination of random point selection and K-means point selection to obtain landmark points,and use spectral clustering algorithm for landmark points to obtain their clustering results,then the sample points are mapped to the nearest landmark points to generate base clustering.In the second step,the super-cluster was obtained by using the defragmenting strategy for the intersecting clusters,the uncertainty of the base cluster was calculated based on the information entropy,and the corresponding weight was given to the base clustering,then the co-association matrix of the weighted supercluster is obtained by using the weighted method.The third step is to use hierarchical clustering algorithm on the co-association matrix to get the ensemble clustering result.Experimental results show that the algorithm can effectively improve the effectiveness and performance of ensemble clustering.2.An ensemble clustering algorithm based on representative point and low-rank representation is proposed.After getting the base clustering,the traditional ensemble clustering algorithm usually constructs the co-association matrix with all the samples as the operating unit,and directly uses the clustering algorithm on the co-association matrix to get the ensemble clustering results.Noises such as outliers in the samples affect the co-association matrix and reduce the performance of ensemble clustering.Scalable ensemble clustering via building dense representation matrix can solve the above problems.The algorithm consists of three steps: the first step is to obtain a binary membership matrix based on base clustering,the samples are divided into several groups,and representative points are selected according to the binary membership value of each group;the second step is to construct a co-association matrix according to the representative points,and optimize the co-association matrix through low-rank representation;in the third step,the graph clustering algorithm is used on the co-association matrix to obtain the ensemble clustering results.The experimental results show the effectiveness of the algorithm.3.A text clustering algorithm based on ensemble clustering is proposed.For the current text clustering algorithm which is mainly based on partition,hierarchy,density,grid or model to get the final text clustering results,this paper proposes to use ensemble clustering algorithm in the framework of text clustering algorithm to obtain the clustering results.The article selects three text datasets for experiments,and the results show the good performance of the algorithm.
Keywords/Search Tags:Ensemble clustering, Co-association matrix, Weighted strategy, Low-rank representation, Text clustering
PDF Full Text Request
Related items