Font Size: a A A

Research On Efficient Clustering Ensemble Algorithm Based On Random Subspace

Posted on:2019-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:L Y YanFull Text:PDF
GTID:2428330551958741Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is an effective method to handle unlabeled data,and it has been widely used in many fields such as data mining,information retrieval,image segmentation,and machine learning.With the advent of the era of big data,cluster analysis faces many new challenges.For example,when data volume is large,how to cluster efficiently,how to cluster effectively when the dimension of data is higher and higher.For the problem of cluster analysis in high dimensional data,subspace based clustering ensemble is an effective method.However,most of the existing subspace based clustering ensemble algorithms demand high computational complexity because of constructing attribute subspaces.To solve this problem,a completely random subspace based clustering ensemble algorithm was proposed,which can generate the attribute subspace efficiently,but at the same time,because of the completely random construction of the subspace,each subspace could not include any important attribute,resulting in low quality of the final clustering result.To promote the effectiveness and efficiency of the subspace based clustering ensemble,the construction method of random subspaces will be systematically investigated in this dissertation,which are as follows:(1)A method of selecting core attributes was proposed based on complementary mutual information and a method of generating random subspaces with core attributes was given.Combined with certain ensemble strategies,a clustering ensemble algorithm based on random subspaces with core was designed.Because the random subspaces with core attributes not only guarantee the ability of each subspace to describe the overall information of the dataset,but also preserve the divergences among subspaces,it can improve the performance of clustering ensemble algorithm based on random subspace to some extent.(2)By means of attributes' significance,all attributes were stratified,and a random subspace generation method,based on the stratified attributes,was proposed.A method of compressing object set was proposed,which can be used to generate a compressed object set in a subspace.Furthermore,based on this kind of compressed object sets in these stratified subspaces,a clustering ensemble algorithm were proposed by using certain ensemble strategies.Since the subspaces were constructed by the random subspace generation method based on stratified attributes,the distribution of attributes with different significance is reasonable in each subspace and different subspaces are divergent,and by compressing the object set in each subspace,the scale of the object sets participating in the clustering can be reduced.Therefore,based on the method proposed in this paper,it is possible to obtain a higher performance basis clustering results efficiently,and then to improve the performance of clustering ensemble algorithm based on random subspace.(3)A multi-function clustering system was designed and implemented.The system includes many functions such as data import,cluster analysis,cluster evaluation,and graphic visualization.The system integrates two clustering ensemble algorithms based on the subspace generation strategies proposed in this paper,aiming to show the influence of different subspace construction methods on the performance of the random subspace clustering ensemble algorithm.In addition,this system also provides some basic clustering algorithms,which can be used as a general tool of implementing clustering.
Keywords/Search Tags:Cluster, Clustering ensemble, Random subspace, Rough set, Mutual information
PDF Full Text Request
Related items