Font Size: a A A

Research On Semi-Supervised Learning And Its Application

Posted on:2012-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:W T LiuFull Text:PDF
GTID:2218330338464064Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The traditional machine learning research tends to do research on labeled data and unlabeled data separately; however, in real life, both cases are often co-existing. Semi-supervised learning is proposed to solve such problem. In many traditional applications, semi-supervised is usually used as an improved method for clustering, and does not make full use of the cluster information. Motivated by this, this thesis explores how to use the cluster information to help semi-supervised learning.The main purpose of semi-supervised learning is to obtain a good learner on the basis of a few of labeled data and large unlabeled data. Self-training algorithm is an important semi-supervised learning algorithm. However, there are two problems that should be solved:First, how to choose the right new labeled sample, which will be added to the original labeled sample set; Second, how to deal with the problem that unlabeled sample will be wrongly labeled during labeling process.For the two problems, we propose an algorithm to solve them. The general thoughts are as follows:after labeling the unlabeled data by classifier, we apply clustering method to process new unlabeled sample data set. Then we make use of the data editing techniques to eliminate the wrong labeled sample. This makes the classifier avoid the wrong labeled sample in some extent.To measure the effectiveness of the proposed method, we test it on benchmark data set and compare it with several other methods. The results show that the algorithm which makes use of cluster information is better than those compared algorithms, and the convergence rate is much faster than other algorithms.
Keywords/Search Tags:Semi-supervised Learning, Clustering, Self-training
PDF Full Text Request
Related items