Research On Several Algorithms And Theories In Diversity-Based Semi-Supervised Learning

Posted on:2013-10-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Jiang

Full Text:PDF

GTID:1228330434971226

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

Training a reliable classifier relies on the availability of enough labeled data in traditional machine learning algorithms. However, in many real-world tasks, such as text classification and gene analysis, labeled data are often difficult or expensive to get, while unlabeled data are readily available in abundance with the development of data collection and storage technologies. As a consequence, how to improve the generalization ability of calssifiers on few labeled data has become an important issue in the machine learning research.Semi-supervised learning (SSL), which attempts to learn from both labeled and unlabeled data, can effectively overcome the "label bottleneck", and has great practical significance and development future. Co-training style algorithms are able to exploit unlabled data easily and controllably, and require no additional prior knowledge for model assumption, therefore, they can readily be combined with existing supervised learning models, and achieve more abroad success compared to other semi-supervised learning algorithms.On the other hand, ensemble learning combines multiple diverse classifiers to improve the generalization ability, and there are some ensemble learning algorithms which also exploiting unlabeled data in recent years. Both these algorithms and Co-training style algorithms exploit unbeled data based on the diversity among base classifiers to improve the generalization ability, thus they can be identified as the subcategory of "Diversity-based semi-supervised learning". How to effectively create and exploit the diversity, and how to control the noise in pseudo-labeled data, are the main problems to deal with in these algorithms.In this thesis, our research mainly concentrates on several alrorithms designing and theories, based on Co-training and the combination of ensemble learning. The main results and innovations are as follows:Firstly, we investigate the combination of generative and discriminative methods in co-training framework, and then acquire diversity by employing these two reciprocal methods for Co-training algorithms to deal with the problem that independent views are hardly satisfied in real applications. We propose a backtracking mechanism in Co-training to increase the security when exploiting unlabeled data. Furthermore, we introduce a pair of weight parameters to regulate the weight of pseudo-labeled data to avoid the local optimal solution caused by non-convex objective functions, and define a hybrid objective function to tune their values during co-training.Secondly, we present a more general co-training style framework, Co-learning, where multiply classifiers can work with multiple source of diversity, and develop two concrete algorithms to combining multiple diverse classifiers according to different training means. Furthermore, we present a new method to create diversity by manipulating the pseudo-labeled data.Thirdly, we investigate the combination of ensemble learning and Co-training style algorithms, and present two algorithms:SECL and PECL. We define a voting margin function combined with confidence to select pseudo-labeled data and predict unlabeled data. Additionally, we propose a weighted bagging method to generate an ensemble of diverse classifiers at the end of co-training.Fourthly, aimed at the characteristic of Diversity-based Semi-supervised learning, we define a hybrid classification and distribution (HCAD) noise, and give a Probably Approximately Correct (PAC) analysis for co-training style algorithms in the presence of HCAD noise among training data. Moreover, based on the voting margin, an upper bound is developed on the generalization error of multi-classifier voting systems, in the presence of HCAD noise.

Keywords/Search Tags:

Semi-supervised Learning, Ensemble learning, Diversity, Co-training, Classification

PDF Full Text Request

Related items

1	Research On The Diversity In Ensemble Learning
2	Industrial Process Fault Classification Based On Semi-Supervised Deep Learning
3	Research On Semi-supervised Learning Algorithms Based On Ensemble Learning
4	Research On Semi-supervised Classification Algorithm Based On Integrated Neural Network
5	A Framework For Ensemble Learning Based Heterogeneous Extreme Learning Machines
6	Research On Image Classification Algorithm Based On Semi-supervised Learning
7	Ensemble Based Semi-supervised Learning For Fault Classification
8	Semi-supervised Ensemble Learning For Hyperspectral Image Classification
9	Semi-Supervised Learning Based On Ensemble Algorithm
10	Research On Semi-supervised Learning Classification Algorithm Based On Mult-view