Semi-supervised Clustering And Feature Selection For Symbolic Data

Posted on:2014-12-10

Degree:Master

Type:Thesis

Country:China

Candidate:W T Wang

Full Text:PDF

GTID:2268330395489212

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In the fields of the machine learning, clustering and feature selection have been providing an effective and efficient method for data analysis. Clustering is a fundamental technique of unsu-pervised learning, where the task is to find inherent structure form unlabeled data. A good cluster should divide the data into several clusters so that the intra-cluster similarity is maximized while the inter-cluster similarity is minimized. On the other hand, feature selection is applied to reduce the number of features in many applications where datasets have hundreds or thousands of features. It has proven in both theory and practice effective in enhancing learning efficiency, increasing predictive accuracy, and reducing complexity of learned results, especially for high-dimensional datasets.Recently, Semi-supervised learning has become an attractive methodology for improving clas-sification models and is often viewed as using unlabeled data to aid supervised learning. Similarly with supervised learning, semi-supervised clustering and semi-supervised feature selection have been active areas. However, most of those semi-supervised methods are applied to continual-value data, few methods are suitable for clustering and feature selection of symbolic value data in semi-supervised learning. This paper proposes two semi-supervised clustering algorithms and two semi-supervised feature selection approaches for symbolic data respectively.The first method is based on clustering ensemble and the clusterers are generated by k-Modes. In addition, we provide four voting strategies to obtain the final clustering results. Base on split-merge model, we propose a semi-supervised symbolic clustering method. Though equivalent rela-tions using unsupervised and supervised information, we obtain small partitions where objects are similar. The final cluster partition are merged by four different distances measurements between clusterings.Inspired feature selection method mRMR in supervise learning, semi-supervised relevance and redundance measurement are redefined and a novel stop criterion is proposed to control the number of feature selection. In the last method, we extend the classical dependence degree in rough set to the semi-supervised framework, called daul dependence degree. The dual degree measures not only the dependence with respect to the decision attribute, but also the redundancy between conditional attributes. For the proposed semi-supervised feature selection, we present two different search feature subset strategies.Experiments show that our semi-supervised clustering and feature selection methods for sym-bolic data are effective and efficiency, which can provide an alternative solution for semi-supervised learning.

Keywords/Search Tags:

Machine Learning, Semi-supervised Learning, Clustering, Feature Selection, Sym-bolic Data, Clustering Ensemble

PDF Full Text Request

Related items

1	Research On Semi-supervised Classification Algorithm Based On Clustering Ensemble
2	Research On Semi-supervised Selective Clustering Ensemble
3	Research On Risk Degree-Based Safe Semi-Supervised Fuzzy Clustering Algorithm
4	Adaptive Semi-supervised Clustering Ensemble For High Dimensional Data
5	Research Of Ensemble Selection Algorithm Based On Semi-Supervised Regression And Its Application
6	Semi Supervised Clustering Algorithm And Its Application And Research
7	Research On Clustering Methods And Their Applications
8	Research On Semi-Supervised Clustering Ensemble Model
9	Semi-supervised Three-way Clustering Ensemble Method
10	Adaptive Regularized Semi-supervised Clustering Ensemble Study Based On Constraint Selection