Learning From Limited And Imperfect Tagging

Posted on:2014-09-30

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z A Qi

Full Text:PDF

GTID:1268330425981375

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Classification is an important research area in the area of machine learning and data mining, and the selection of the training set plays a key role in the learning process of the classification. However, with the typically explosive amount of data to be tagged, it is very time-consuming and labor-intensive for manually labeling data. Consequently, the labeled training data are expensive to acquire. Further, in many real-world applications, the tagging for the labeled data is actually imperfect: incomplete, inconsistent, and error-prone. Based on the extensive survey of the classic classification algorithms in the related literature, this thesis studies issues related to the training set of the classification, and proposes several novel generative models and discriminative classifiers, which take advantage of the information contained in the unlabeled data, in the multi-label space, and in the multiple views to solve the problem that the labeled training data are limited, and the tagging for the labeled data is imperfect. The major contributions are as follows:1. A novel generative model based on Hierarchical Dirichlet Process is proposed to solve the problem of learning from incomplete tagging. By taking full advantage of information contained in the initially given labels of the training data, the proposed statistical model updates the labels of the incompletely tagged training data, and enhances the correspondence between the labels and the features, which ultimately makes the classification more precise. This model can be utilized to complete all the missing labels in the given partially labeled training set and to predict all the missing labels for the unlabeled and any new, unseen data.2. Several novel discriminative classifiers are proposed, which can utilize all the given multiple tags of the training data simultaneously to solve the problem of learning from noisy tagging. By considering all the given multiple tags as an additional feature, we propose a novel distance measure to find the neighborhood of each instance in the multi-label space, which utilizes the various relationships among the multiple tags. Then the proposed discriminative classifiers take advantage of the information contained in the neighborhood of each training instance in different ways to mitigate the influence of the noise in the classification.3. A general and effective learning scheme based on the discriminative classifier is proposed to solve the problem of learning from incomplete tagging and the problem of learning from noisy tagging. In the proposed method, different weights are given to different training instances to reflect their confidence of classification, and the redundancy among the multiple views and the information contained in the multi-label space are utilized to update the weights during the training process. The proposed method can not only complete the missing tags for the incompletely tagged training set, but also correct the incorrect tags originally given in the noisily tagged training set.4. A novel discriminative classification method is proposed to solve the problem of learning with limited and noisy tagging. By exploiting the unlabeled data through a semi-parametric regularization, the proposed method learns the complementary information to the labeled training data, which helps solving the problem of learning from limited labeled data. The information contained in the multi-label space is also utilized in the proposed method to mitigate the influence of the noise in the classification.

Keywords/Search Tags:

multi-label classification, generative model, discriminative classifier, noisy tagging learning, incomplete tagging learning, semi-supervised learning, multi-view learning

PDF Full Text Request

Related items

1	Research On The Application Of Semi-supervised Learning In Natural Language Processing
2	Based On Incomplete Supervision Multi-label Classification Algorithm
3	Research On Multi-label Classification With Incomplete Label Information
4	Research On Semi-supervised Learning Classification Algorithm Based On Mult-view
5	Design And Implementation Of Clothing Image Auto-tagging System Based On Deep Learning
6	Research On Machine Learning Algorithms For Data With Multiple Annotations
7	Multi-label Image Classification Techniques Based On Semi-supervised Learning
8	Comparison And Improvement Of Two Methods Based On Semi-supervised Learning
9	Comparison And Improvement Of Two Methods Based On Semi-Supervised Learning
10	Research On The Dimensionality Reduction And Classification Algorithms In Multi-label Learning