Font Size: a A A

Research On Several Algorithms And Theories In Diversity-Based Semi-Supervised Learning

Posted on:2013-10-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z JiangFull Text:PDF
GTID:1228330434971226Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Training a reliable classifier relies on the availability of enough labeled data in traditional machine learning algorithms. However, in many real-world tasks, such as text classification and gene analysis, labeled data are often difficult or expensive to get, while unlabeled data are readily available in abundance with the development of data collection and storage technologies. As a consequence, how to improve the generalization ability of calssifiers on few labeled data has become an important issue in the machine learning research.Semi-supervised learning (SSL), which attempts to learn from both labeled and unlabeled data, can effectively overcome the "label bottleneck", and has great practical significance and development future. Co-training style algorithms are able to exploit unlabled data easily and controllably, and require no additional prior knowledge for model assumption, therefore, they can readily be combined with existing supervised learning models, and achieve more abroad success compared to other semi-supervised learning algorithms.On the other hand, ensemble learning combines multiple diverse classifiers to improve the generalization ability, and there are some ensemble learning algorithms which also exploiting unlabeled data in recent years. Both these algorithms and Co-training style algorithms exploit unbeled data based on the diversity among base classifiers to improve the generalization ability, thus they can be identified as the subcategory of "Diversity-based semi-supervised learning". How to effectively create and exploit the diversity, and how to control the noise in pseudo-labeled data, are the main problems to deal with in these algorithms.In this thesis, our research mainly concentrates on several alrorithms designing and theories, based on Co-training and the combination of ensemble learning. The main results and innovations are as follows:Firstly, we investigate the combination of generative and discriminative methods in co-training framework, and then acquire diversity by employing these two reciprocal methods for Co-training algorithms to deal with the problem that independent views are hardly satisfied in real applications. We propose a backtracking mechanism in Co-training to increase the security when exploiting unlabeled data. Furthermore, we introduce a pair of weight parameters to regulate the weight of pseudo-labeled data to avoid the local optimal solution caused by non-convex objective functions, and define a hybrid objective function to tune their values during co-training.Secondly, we present a more general co-training style framework, Co-learning, where multiply classifiers can work with multiple source of diversity, and develop two concrete algorithms to combining multiple diverse classifiers according to different training means. Furthermore, we present a new method to create diversity by manipulating the pseudo-labeled data.Thirdly, we investigate the combination of ensemble learning and Co-training style algorithms, and present two algorithms:SECL and PECL. We define a voting margin function combined with confidence to select pseudo-labeled data and predict unlabeled data. Additionally, we propose a weighted bagging method to generate an ensemble of diverse classifiers at the end of co-training.Fourthly, aimed at the characteristic of Diversity-based Semi-supervised learning, we define a hybrid classification and distribution (HCAD) noise, and give a Probably Approximately Correct (PAC) analysis for co-training style algorithms in the presence of HCAD noise among training data. Moreover, based on the voting margin, an upper bound is developed on the generalization error of multi-classifier voting systems, in the presence of HCAD noise.
Keywords/Search Tags:Semi-supervised Learning, Ensemble learning, Diversity, Co-training, Classification
PDF Full Text Request
Related items