Font Size: a A A

Learning From Limited And Imperfect Tagging

Posted on:2014-09-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z A QiFull Text:PDF
GTID:1268330425981375Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Classification is an important research area in the area of machine learning and data mining, and the selection of the training set plays a key role in the learning process of the classification. However, with the typically explosive amount of data to be tagged, it is very time-consuming and labor-intensive for manually labeling data. Consequently, the labeled training data are expensive to acquire. Further, in many real-world applications, the tagging for the labeled data is actually imperfect: incomplete, inconsistent, and error-prone. Based on the extensive survey of the classic classification algorithms in the related literature, this thesis studies issues related to the training set of the classification, and proposes several novel generative models and discriminative classifiers, which take advantage of the information contained in the unlabeled data, in the multi-label space, and in the multiple views to solve the problem that the labeled training data are limited, and the tagging for the labeled data is imperfect. The major contributions are as follows:1. A novel generative model based on Hierarchical Dirichlet Process is proposed to solve the problem of learning from incomplete tagging. By taking full advantage of information contained in the initially given labels of the training data, the proposed statistical model updates the labels of the incompletely tagged training data, and enhances the correspondence between the labels and the features, which ultimately makes the classification more precise. This model can be utilized to complete all the missing labels in the given partially labeled training set and to predict all the missing labels for the unlabeled and any new, unseen data.2. Several novel discriminative classifiers are proposed, which can utilize all the given multiple tags of the training data simultaneously to solve the problem of learning from noisy tagging. By considering all the given multiple tags as an additional feature, we propose a novel distance measure to find the neighborhood of each instance in the multi-label space, which utilizes the various relationships among the multiple tags. Then the proposed discriminative classifiers take advantage of the information contained in the neighborhood of each training instance in different ways to mitigate the influence of the noise in the classification.3. A general and effective learning scheme based on the discriminative classifier is proposed to solve the problem of learning from incomplete tagging and the problem of learning from noisy tagging. In the proposed method, different weights are given to different training instances to reflect their confidence of classification, and the redundancy among the multiple views and the information contained in the multi-label space are utilized to update the weights during the training process. The proposed method can not only complete the missing tags for the incompletely tagged training set, but also correct the incorrect tags originally given in the noisily tagged training set.4. A novel discriminative classification method is proposed to solve the problem of learning with limited and noisy tagging. By exploiting the unlabeled data through a semi-parametric regularization, the proposed method learns the complementary information to the labeled training data, which helps solving the problem of learning from limited labeled data. The information contained in the multi-label space is also utilized in the proposed method to mitigate the influence of the noise in the classification.
Keywords/Search Tags:multi-label classification, generative model, discriminative classifier, noisy tagging learning, incomplete tagging learning, semi-supervised learning, multi-view learning
PDF Full Text Request
Related items