Font Size: a A A

Research On Ensemble Clustering Algorithm Based On Improved Co-Association Matrix

Posted on:2021-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiuFull Text:PDF
GTID:2518306104987269Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Clustering analysis is the most common research direction in unsupervised learning,and it is widely used in the fields of product recommendation in e-commerce,information distribution of news and short videos.With the advent of the era of big data,both the sample size and data dimensions of the clustering data set become more and more huge,single clustering algorithm becomes difficult to apply to these situations.Therefore,an ensemble clustering algorithm that leverages ensemble learning comes into being.After studying several types of ensemble clustering algorithms based on different theories,this paper finds that the lack of ensemble clustering algorithms,for example,the distance between the data sample points and the cluster center,and the effect of clustering by the different base clusterers is not considered.The levels that can be improved mainly focus on data sample points,clusters obtained by the base clusterer,and base clustering.Therefore,this paper mainly focus on the construction of the co-association matrix in the ensemble clustering algorithm,and proposes improvements in these three levels.The improvements are as follows:In order to improve the ensemble clustering algorithm at data sample points level,this paper proposes a method for calculating the membership probability of data samples based on the principle of Parzen window,which is applied to the construction of co-association matrix,and a comparative experiment is designed.The experiment results show that compared with the classical ensemble clustering algorithm,the improvement at the data sample point level can bring an average improvement of 151.02% on the commonly used clustering evaluation indicators.In order to improve the ensemble clustering algorithm at the cluster level obtained by the base clusterer,this paper proposes a calculation method based on the KL divergence,which is utilized to calculate the importance of the cluster obtained by the base clusterer.This method improves the construction of the co-association matrix at the cluster level.The comparison experimental results show that compared with other improved ensemble clustering algorithm at the data sample points level and cluster level,the clustering effect obtained by combining the improvement at these two levels in this paper can bring an average 5.53% improvement.In order to measure the clustering effect of different base clusterers and improve the ensemble clustering algorithm at the clustering result of the base clusterer level,this paper proposes a method of weighting base clusterers based on Pareto optimal principle.Different base clusterers are given different weights according to the performance of the base clusterers.Through comparative experiments with the ensemble clustering algorithm also improved on multiple levels,this paper combined the improvement of data sample points,clusters and the clustering result of the base clusterer on three levels can achieve an average improvement of 46.16% in the clustering effect.
Keywords/Search Tags:Ensemble clustering, co-association matrix, Parzen window, KL-divergence, Pareto optimality
PDF Full Text Request
Related items