Research On Semi-supervised Learning Algorithm Based On Tri-training Algorithm

Posted on:2013-12-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Chang

Full Text:PDF

GTID:2248330374956534

Subject:Systems Engineering

Abstract/Summary:

PDF Full Text Request

Semi-supervised Learning (SSL), a kind of application-driven machine learning method, has become one of hot topics of artificial intelligence and pattern recognition areas. As the main branches of SSL, semi-supervised clustering gives a small amount of supervised information into the search process of optimal clustering, and semi-supervised classification attempts to exploit implied useful information from unlabeled samples in order to assist the classifierâ€™s training. Recently, researchers have continuously introduced all kinds of SSL algorithms, which are applied to some actual fields, natural language processing, image processing, biometric identification, and so on.Tri-training is a representative method based on co-training mechanism. Although it can use classifiers for unlabeled samplesâ€™ annotation, this algorithm requires given sufficient labeled samples to guarantee initial classifiers with greater difference, besides it canâ€™t deal with the situation where given supervised information includes pair-wise constraints.Therefore, regarding these described above shortcomings, this paper employs Tri-training as starting point, and concerns how to effectively select and mark unlabeled samples for semi-supervised clustering and classification, when there are different forms of supervised information. The main work is summarized in three aspects as follows:(1) A semi-supervised clustering algorithm based on Tri-training is introduced, as supervised information not only includes labeled samples, but also pair-wise constraints. Firstly, this algorithm selects some unlabeled samples and requires their class label, to enlarge the number of initial labeled samples. Secondly, pair-wise constraints are utilized to optimize enlarged labeled samples, with the purpose of improving its quality. Finally, parameters of K-Means algorithm are initialized by optimized labeled samples, and in the search process, pair-wise constraints are used to modify the clustering results each time. We also apply the proposed method to K-Means, Seeded-K-Means and COP-K-Means algorithm. Experimental results demonstrate that this method can take full advantage of given supervised information and get a better clustering results.(2) An active semi-supervised classification algorithm is proposed, based on Tri-training and few labeled samples, when there are only very few labeled samples in given supervised information. This method selects certain unlabeled samples which are most possibly wrong predicted or most typically represented class attribute, by means of integrating active learning thought. And these unlabeled samples are marked by expert users, to increase the number of initial labeled samples. Comparative experimental results show that when given initial labeled samples are very few in number, and Tri-training is unable to obtain satisfactory results, the proposed method can attain a classification model of better capability.(3) An active semi-supervised classification algorithm is designed, based on Tri-training and pair-wise constraints, when there are pair-wise constraints in supervised information. This method requires informative samples which are marked for expert users, so that there are enough labeled samples. And in the process of classification, pair-wise constraints are used to optimize labeled samples for training its classifier each time, in order to improve the data security. Experimental results illustrate that this method can effectively deal with the case where supervised information includes pair-wise constraints, compared with Tri-training. Furthermore, with the algorithm which isnâ€™t introduced pair-wise constraints optimization mechanism, the proposed method not only improves prediction accuracy, but also less affected by parameters change, and more stable of the performance.Towards different forms of given supervised information, the paperâ€™s research results can provide references about how to conduct SSL effectively, and further extend tri-trainingâ€™s application prospects in the actual fields.

Keywords/Search Tags:

Semi-supervised learning, Tri-training algorithm, Seeds sets, Pair-wise constraints, Active learning

PDF Full Text Request

Related items

1	Semi-supervised Learning On Text Data
2	Some contributions to semi-supervised learning
3	Semi-supervised Clustering Algorithm And Implementation Based On Seeds Set And Pairwise Constraints
4	Research On Partially Labeled Problem Based On Active Learning And Semi-supervised Mechanism
5	A Semi-supervised Spectral Clustering Algorithm Research Based On Active Learning
6	Research On Active Learning Algorithms Of Pairwise Constraints In Semi-supervised Clustering
7	Research On Parallel Implementation Of Semi-Supervised Clustering
8	Several Theoretical Issues On Semi-supervised Learning
9	Research On Optimization Of Semi-supervised Classification Algorithm Combining With Active Learning
10	Research On Chinese Parallel Structure Recognition Based On Semi-Supervised Learning