Font Size: a A A

Research On Semi-supervised Learning Algorithm Based On Tri-training Algorithm

Posted on:2013-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChangFull Text:PDF
GTID:2248330374956534Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Semi-supervised Learning (SSL), a kind of application-driven machine learning method, has become one of hot topics of artificial intelligence and pattern recognition areas. As the main branches of SSL, semi-supervised clustering gives a small amount of supervised information into the search process of optimal clustering, and semi-supervised classification attempts to exploit implied useful information from unlabeled samples in order to assist the classifier’s training. Recently, researchers have continuously introduced all kinds of SSL algorithms, which are applied to some actual fields, natural language processing, image processing, biometric identification, and so on.Tri-training is a representative method based on co-training mechanism. Although it can use classifiers for unlabeled samples’ annotation, this algorithm requires given sufficient labeled samples to guarantee initial classifiers with greater difference, besides it can’t deal with the situation where given supervised information includes pair-wise constraints.Therefore, regarding these described above shortcomings, this paper employs Tri-training as starting point, and concerns how to effectively select and mark unlabeled samples for semi-supervised clustering and classification, when there are different forms of supervised information. The main work is summarized in three aspects as follows:(1) A semi-supervised clustering algorithm based on Tri-training is introduced, as supervised information not only includes labeled samples, but also pair-wise constraints. Firstly, this algorithm selects some unlabeled samples and requires their class label, to enlarge the number of initial labeled samples. Secondly, pair-wise constraints are utilized to optimize enlarged labeled samples, with the purpose of improving its quality. Finally, parameters of K-Means algorithm are initialized by optimized labeled samples, and in the search process, pair-wise constraints are used to modify the clustering results each time. We also apply the proposed method to K-Means, Seeded-K-Means and COP-K-Means algorithm. Experimental results demonstrate that this method can take full advantage of given supervised information and get a better clustering results.(2) An active semi-supervised classification algorithm is proposed, based on Tri-training and few labeled samples, when there are only very few labeled samples in given supervised information. This method selects certain unlabeled samples which are most possibly wrong predicted or most typically represented class attribute, by means of integrating active learning thought. And these unlabeled samples are marked by expert users, to increase the number of initial labeled samples. Comparative experimental results show that when given initial labeled samples are very few in number, and Tri-training is unable to obtain satisfactory results, the proposed method can attain a classification model of better capability.(3) An active semi-supervised classification algorithm is designed, based on Tri-training and pair-wise constraints, when there are pair-wise constraints in supervised information. This method requires informative samples which are marked for expert users, so that there are enough labeled samples. And in the process of classification, pair-wise constraints are used to optimize labeled samples for training its classifier each time, in order to improve the data security. Experimental results illustrate that this method can effectively deal with the case where supervised information includes pair-wise constraints, compared with Tri-training. Furthermore, with the algorithm which isn’t introduced pair-wise constraints optimization mechanism, the proposed method not only improves prediction accuracy, but also less affected by parameters change, and more stable of the performance.Towards different forms of given supervised information, the paper’s research results can provide references about how to conduct SSL effectively, and further extend tri-training’s application prospects in the actual fields.
Keywords/Search Tags:Semi-supervised learning, Tri-training algorithm, Seeds sets, Pair-wise constraints, Active learning
PDF Full Text Request
Related items