Font Size: a A A

The Research On Semi-supervised Collaboration-training Algorithm

Posted on:2012-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:X LanFull Text:PDF
GTID:2218330374453755Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Traditional supervised learning needs plenty of labeled data to train a classifier, however, it is hard to obtain labeled data in practice, while unlabeled data is much easy to get. Semi-supervised Learning which makes advantage of unlabeled data to improve the accuracy of a classifier have reached much attention and have become a topic of significant research interest.Collaboration-training is one of semi-supervised learning algorithm which is easy to understand, stable and have high speed of convergence. Until now, collaboration-training have attracted the attention of many researchers and have reached lots of achievements. As the improvement of Collaboration-training theory, applications based on Collaboration-training have gradually penetrated in many fields, like Natural Language Processing, Content Based Image Retrieval and Pattern Recognition.In this dissertation, current reach status about Semi-supervised Collaboration-training algorithm in both domestic and abroad is introduced firstly. The procedure of collaboration-training development and the main issues existed in collaboration-training are also analyzed. At last, the main research works based on these issues are demonstrated in detail. The specific works are presented in following:1. To resolve the problem of dependence among classifiers in collaboration-training algorithm, an algorithm named Multi-view semi-supervised neural network base on Tri-training framework are proposed. This algorithm expands the independence of Neural Network through diversify of transfer function to improve the accurate of classification and the performance of collaboration-training.2. To resolve the noise data problem in unlabeled data which used to update classifier, an algorithm named Genetic Algorithm-based unlabeled chosen semi-supervised collaboration-training algorithm is proposed. In this algorithm, we make advantage of the optimization design of Genetic Algorithm to assist collaboration-training to pick up valuable unlabeled data indirectly. The algorithm is benefit for updating classifier effectively and can prevent the introduction of noise unlabeled data. It is also helpful to prevent the degradation of the algorithm.3. To further pick up unlabeled data effectively, another solution which named graph-based explicit confidence estimation semi-supervised collaboration-training algorithm is proposed. This algorithm combines with the advantage of graph-based semi-supervised learning and collaboration-training. It makes use of structure information of labeled data and unlabeled data to calculate the probability of unlabeled data explicitly. Combining with three classifiers, the algorithm calculates the confidence of unlabeled data implicitly. With dual-confidence estimation, the unlabeled data is chosen to update classifiers effectively. Experiments on UCI datasets prove the efficiency of this algorithm.
Keywords/Search Tags:Semi-supervised Learning, Collaboration-training Algorithm, Tri-training Algorithm, Co-training, Graph-based Semi-supervised Learning
PDF Full Text Request
Related items