Research On Clustering Ensemble Methods And Their Applications

Posted on:2012-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:S Chen

Full Text:PDF

GTID:2218330368483548

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis is an important study field in data mining, which has been widely applied in the areas of statistics, biology and marketing. Hundreds of clustering algorithms have been proposed recently. However, conventional clustering algorithms often suffer from the curse of dimensionality thus producing bad performance for high-dimensional data. Soft subspace clustering is an effective means of processing high dimensional data. However, most existing soft subspace clustering algorithms contain parameters which are difficult to be determined by users. In real-world applications, it is difficult to find a single clustering algorithm which is able to handle the clusters with all types of shapes and sizes, and determine which clustering algorithm should be used for a particular dataset. Therefore, many scholars begin to study clustering ensembles methods. Clustering ensembles can go beyond a single clustering algorithm in robustness, novelty, stability, parallelization and scalability.The paper first gives a review on subspace clustering, clustering ensembles, semi-supervised learning and imbalanced data classification. To overcome the traditional subspace clustering algorithm, we then propose a new soft subspace clustering algorithm named SC-IFWSA, which does not require users to set any parameter values by using an improved feature weight self-adjustment mechanism (IFWSA), and can update adaptively the weights of all dimensions for each cluster by their adjustment margins. Based on clustering ensembles, we further propose two new methods to overcome the traditional semi-supervised classification and imbalanced data classification respectively, so as to improve classifier performance:(1) a new semi-supervised classification algorithm based on clustering ensembles named SSCCE is proposed, which uses an easily understandable labeling confidence estimation method. It first generates multiple partitions of the given data, and matches clusters in different partitions. Then the unlabeled training samples with high clustering consistency index are selected and added into the labeled training set after being labeled. Finally, a learner is trained on the enlarged labeled training set; (2) this paper proposes a type of novel classification for imbalanced data sets based on clustering ensembles, which aims to provide classification methods with a better training platform by introducing clustering consistency index to find the cluster boundary minority examples and the cluster center majority examples, and then using the improved synthetic minority over-sampling technique (SMOTE) and the modified random under-sampling method respectively to deal with imbalanced data sets. Experimental results show that the three proposed methods are effective and feasible, and can perform better on most data sets.

Keywords/Search Tags:

Clustering Ensembles, Classification, High Dimensional Data, Subspace Clustering, Semi-Supervised Learning, unbalanced Data Sets

PDF Full Text Request

Related items

1	Semi-supervised Subspace Clustering Based On Space-level Constraint
2	Adaptive Semi-supervised Clustering Ensemble For High Dimensional Data
3	Semi Supervised Clustering Algorithm And Its Application And Research
4	Research On Semi-supervised Classification Of Data Stream Based On Adaptive Density Clustering
5	Research On Semi-supervised Clustering And Classification Algorithm
6	The Study Of Semi-supervised Subspace Clustering And Its Applications
7	Research On Key Technologies Of Clustering High-dimensional Data Based On Sparse Subspace And Their Applications
8	Research Of Image Clustering And Classification In Subspace
9	Study On High-dimensional Data Subspace Clustering Analysis And Application
10	Research On Subspace Clustering Algorithms For High-dimensional Data