Font Size: a A A

Research On Semi-supervised Selective Clustering Ensemble

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:X C HuangFull Text:PDF
GTID:2428330611997628Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the development of information technology,people have stepped into the era of big data.People produce a lot of information data anytime and anywhere in their daily life.How to get the valuable information hidden in the massive information has become a new research hotspot.Clustering is one of the most commonly used techniques in data mining.Different from the traditional clustering analysis algorithm,clustering ensemble can get better clustering results than a single clustering algorithm by designing a consensus function to fuse multiple different clustering results.However,if the clustering member structure produced in the process of ensemble member generation is different,how to select the appropriate clustering member is very important to the final clustering results.Therefore,some scholars choose the cluster members with good quality and big difference from the cluster members to cluster ensemble through some selection strategies,so as to obtain better clustering results.At present,the research of clustering ensemble and clustering ensemble selection technology is mainly focused on unsupervised learning,without considering the prior knowledge provided by users or experts.Semi-supervised clustering ensemble brings a small amount of labeled data into clustering ensemble,supervises and guides the ensemble process,and finally obtains more superior clustering results,making the whole process more stable,accurate and robust.Inspired by this,this paper attempts to combine selective clustering with semi supervised clustering.Firstly,a part of the initial clustering members are selected by the members selection method based on the quality and difference of members.Then,the key idea of semi-supervised clustering ensemble is used for reference,and the prior knowledge such as pairwise constraints is used to bring semi-supervised information into the process of selective clustering ensemble,a semi-supervised selection clustering ensemble method is designed(SSCES).Aiming at the problem that the data produced in people's daily life is more and more high-dimensional,This paper analyzes the existing dimension reduction algorithms,combined with the principal component analysis(PCA)dimension reduction technology,proposes a pair constrained semi supervised clustering ensemble algorithm based on PCA dimension reduction technology(SSCEDR).Considering that PCA is an unsupervised dimension reduction method and does not take advantage of some useful information in the data,this paper attempts to add a distance function with positive and negative constraints to the objective function of PCA principal component analysis to form a new semi-supervised dimension reduction algorithm(SSDR),and then reduce the dimension of the original data,combined with semi-supervised clustering ensemble,in the reduced dimension space,the prior knowledge is substituted into the clustering ensemble process,and the final clustering results are obtained.Experiments on multiple data sets show that the above algorithm can improve the clustering quality and get better clustering results.
Keywords/Search Tags:clustering ensemble, semi-supervised clustering ensemble, members selection, pairwise constraints, dimension reduction, principal component analysis
PDF Full Text Request
Related items