Font Size: a A A

Research On Key Technologies Of Subspace Cluster Ensemble

Posted on:2015-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2268330428979197Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Subspace clustering algorithm can reduce the influence of redundancy and irrelevant attributes effectively during the clustering process, and improve the clustering accuracy. Existing subspace clustering algorithms emphasize the find of clusters in all subspace, and ignore the divide of subspace. In a high-dimensional dataset, the divide of subspace affects the clustering accuracy of high-dimensional dataset directly. In order to improve the clustering accuracy of high-dimensional dataset, a correct subspace dividing method must be used to divide the subspace.In this thesis, two methods are proposed to divide dataset. The first is the method of subspace dividing based on minimum redundancy feature subset, and the second method based on maximum margin. The method based on the minimum redundancy feature subset was improved based on K-means algorithm. We calculate the mutual information between data characteristic variables instead of calculating the distance between characteristic variables, according to the value of mutual information, data subspace is divided, and subspace divided by this method is called the minimum redundancy feature subspace.The method of subspace dividing based on the maximum margin is determined by the mutual information between each pair of attributes. Characteristic matrix is built based on the mutual information value between each pair of attributes. Meshing method is used on characteristic matrix to get different sub-blocks. Maximal information coefficient is obtained through research the maximal mutual information value of these sub-blocks. Maximal information coefficient reflects the correlation between each pair of attributes. The greater relevance, the smaller margin; The smaller relevance, the greater margin. According to the maximal information coefficient, the subspace could be divided based on the maximum margin principle.Finally, experiments are performed to verify the validity of two subspace dividing methods. Experiments on UCI and NIPS2013competition datasets show that the method of subspace dividing based on minimum redundancy feature subset and the method of subspace dividing based on maximum margin on most datasets perform better than other subspace clustering algorithms.
Keywords/Search Tags:Clustering, Subspace Clustering, Minimum Redundancy Feature Subset, Attribute Maximum Margin
PDF Full Text Request
Related items