Font Size: a A A

Research On Enhanced Soft Subspace Clustering Technology

Posted on:2012-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q GuanFull Text:PDF
GTID:2178330332491433Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the key technologies in data mining, and has been widely used in numerous applications, including electronic commerce, message filtering, bioinformatics and pattern recognition. With the wide application of clustering in practice, some problems had emerged, especially dealing with large-scaled and of high dimensional datasets. Presently, clustering high dimensional datasets is a challenging issue.In order to overcome the problem of clustering on high dimensional data, R. Agrawal firstly proposed the concept of subspace clustering. In summary, subspace clustering can be divided into two categories: hard subspace clustering and soft subspace clustering. Hard subspace clustering methods can identify the accurate subspaces for different clusters. Different from hard subspace clustering, soft subspace clustering doesn't need to find accurate subspace, but give different weights to each cluster's feature. That is to say, it finds fuzzy subspaces for every cluster.Soft subspace clustering has demonstrated great validity, but it still have some weaknesses. For example, almost all soft subspace clustering algorithms construct objective functions only by introducing within-cluster information (e.g. within-cluster compactness). But it can be anticipated that the algorithms will more efficient when simultaneously introducing between-cluster information into soft subspace clustering. In this paper, we have studied several enhanced soft subspace clustering.This dissertation contains the following contents:The first section is exordium. In this section we introduce the research status and application fields of clustering technology.The the second section introduce the research field of high dimensional data clustering and the efficient method to deal with the problem. Then, we focus on subspace clustering algorithm and three chief subspace clustering algorithms.In the third section, the two typical soft subspace clustering techniques are introduced, that is, fuzzy weighting subspace clustering and entropy weighting subspace clustering. For different dataset, the merit of fuzzy weighting subspace clustering is to adjust fuzzy index of the weight vectors adaptively. Different from fuzzy weighting subspace clustering, the weight vectors of entropy weighting subspace clustering are controlled by entropy to some extent.In the fourth section, in order to overcome the weakness that the typical fuzzy weighting soft subspace clustering algorithms only utilize within-cluster information, the enhanced fuzzy weighting soft subspace clustering algorithm(EFWSSC) is presented by introducing between-cluster information into the fuzzy weighting subspace clustering. First, a new objective function by integrating the between-cluster separation and within-cluster compactness in the subspace is proposed. Then, based on this objective function the new clustering rules are derived by Lagrange optimization and the new algorithm is developed. Theoretical analysis and several experiments based on the different data sets demonstrate that the proposed algorithm (EFWSSC) outperforms most of the existing state-of-the-art fuzzy weighting subspace clustering algorithms. In the fifth section, in order to overcome the weakness of the possible clustering algorithm (PCM) in high dimensional data the subspace clustering mechanism is introduced and the subspace possibilistic clustering algorithm (SPC) is presented in this study. The SPC not only has the advantages of PCM algorithm but also has the characteristic of the classic subspace clustering algorithms. It not only has the good adaptability to high dimensional data, but also can detect the subspaces for each cluster effectively. Numerical experiments with synthetic and UCI data sets demonstrate the effectiveness and the merits of SPC.
Keywords/Search Tags:cluster analysis, fuzzy clustering, subspace clustering, feature weight, between-cluster separation
PDF Full Text Request
Related items