Research On Theory, Algorithms And Application Of Graph-based Semi-supervised Learning

Posted on:2013-02-08

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y D Lan

Full Text:PDF

GTID:1118330374976419

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

How to learn from the limited observational data is a classic problem in machinelearning. The traditional supervised learning needs a large number of labeled examples toconstruct a model to predict the label of unobservable data. When the number of labeled datais small, the trained classifier is often difficult to have a good generalization performance. Inunsupervised learning, because there is no labeled data, the direction of the clusteringalgorithm is uncertainty. How to exploits a small amount of labeled data and a large numberof unlabeled data to improve learning performance has become one of the hottest topics inmachine leaning research area.The semi-supervised learning try to use a large number of unlabeled data to improvelearning performance when the labeled data is small, it has a wide range of applications.Based on analyzing the research state and the existing problems of several kinds ofsemi-supervised learning algorithms, the thesis mainly investigates the theory, algorithm andits applications in clustering, classification and dimensionality reduction. The main researchachievements are as follows:1. A semi-supervised clustering algorithm based on graph contraction was proposed inthis paper. At first, the whole data in sample space is represented as an edge-weighted graph.Then the graph is modified by contraction according to must-link constraints and graph theory.This enhanced the effect of the must-link constraints. On this basis, we project sample spaceinto a subspace by combining graph Laplacian with cannot-link constraints. Data clustering isconducted over the modified graph. Although the proposed approach utilizes graph Laplacian,it is used in the modified graph. Experimental results show that the method indeed reaches itsgoal for complex datasets, and it is acceptable when there has small amount of pairwiseconstraints.2. A dimensionality reduction algorithm is proposed by combining graph-basedsemi-supervised learning and active learning. The algorithm integrates graph, semi-supervisedlearning and support vector machine-based active learning to reduce the dimensionality ofhigh dimensional data. First, the high dimensional observational data set is embedded in aweighted graph; use Laplacian eigenmaps to get the initial low-dimensional embedding space. Second, we use support vector machine-based active learning to find the ambiguous samples.We utilize an updated version of graph-embedding; the weighting only takes into accountsamples which are of the same class, using a gravitation constant to attract same-classsamples closer. Repeat this process until the closing conditions met. Experiments show thatthe algorithm has a good performance for dimensionality reduction of high-dimensionalbiomedical image and gene expression.3. An image annotation algorithm is proposed, which uses image segmentation andsemi-supervised learning based on kNN-graph. At first, each image is segmented into severallocal regions, and the kNN-graph of all local regions is constructed from locality-sensitivehashing table. Then the kNN-graph of images is constructed from the kNN-graph of localregions. At last, the graph-based semi-supervised label propagation algorithm is used toannotate the unlabeled images. Experiments show that the algorithm can obtain a higher speedand annotation accuracy. So, the algorithm can be applied to large image data sets.4. We propose a method to calculate the similarity between two images through the localarea of the image. For two images I_m and I_n, they have been segmented into p^m and pⁿ regions. Denote the region set corresponding to image I_m as {r₁^m,r_p^m^m}, similarly theimage I_n's region set is {r₁ⁿ,...,r_pⁿⁿ}. Obviously, the similarity between image I_m and I_n isonly relative to the relationships between the pairwise regions from these two images. So wegive a method to calculate the similarity between the images according to the kNN graph oflocal regions of images.5. We propose a quick method to find the k nearest neighbors of a sample. The methodfind the k nearest neighbors through hash table and the kNN graph construct from sample'slocal properties. Experiments show that this method has a higher speed than linear scan tofind the k nearest neighbors.

Keywords/Search Tags:

Pattern Classification, Machine Learning, Semi-supervised Learning, Clustering, Dimensionality Reduction

PDF Full Text Request

Related items

1	Research On Semi-supervised Clustering And Classification Algorithm
2	Semi-Supervised Clustering And Dimensionality Reduction With Their Applications
3	Research On Graph-based Semi-supervised Algorithms And Its Applications
4	Dimension Reduction In A Number Of Issues
5	Research On Graph-Based Semi-Supervised Learning And Its Applications
6	The Study Of Graph-based Semi-supervised Learning/Dimensionality Reduction Methods And Their Applications
7	Researches On Semi-supervised Learning
8	Research On High Performance Chinese Text Classification Based On Machine Learning
9	Research On Semi-supervised Dimensionality Reduction Algorithms And Its Applications In Medical Expert System
10	Study And Application Of Several Improved Methods Of Nonlinear Dimension Reduction For High Dimensional Data