Font Size: a A A

A Study On Classification Method Based On Integrated Utilization Of The Labeled And/or The Unlabeled Data

Posted on:2017-05-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:A M DongFull Text:PDF
GTID:1108330488980590Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Pattern classification is an important research branch of machine learning. The traditional pattern classification includes supervised classification and unsupervised classification, the traning sets of the former are labeled while that of the latter are unlabeled. With the continuous emergences of new applications, the training sets simultaneously includes the labeled and the unlabeled data; the unlabeled data have such characteristics as large quantity, easily obtained and low cost; while the labeled data have such characteristics as small quantity, difficultly obtained and high cost. Furthermore, the unlabeled data and the labeled data come from different and related domains. In view of this phenomenon, based on theory of minimum enclosing ball, core vector machine,dimensionality augmentation and common latent space, combined with support vector machine, for research scenes of supervised, semi-supervised and transfer classification, the dissertation proposed a few pattern classification methods with the comprehensive utilizations of the labeled and/or the unlabeled data. Specifically, the main research achievements are as follows:1) For research scene of supervised classification, the dissertation applied the classification problem whose training data are sparse labeled data into recommendatation system and proposed a new recommendation algorithm and its corresponding fast algorithm which in essence was personal recommendation algorithm based on supervised classification model. Specifically, based on support vector machine, the proposed algorithm combined the traditional recommendation algorithm with the theory of minimum enclosing ball and core vector machine, then the traditional recommendation algorithm was transformed into minimum enclosing ball, consequently it was capable of fast processing big data. The proposed algorithm was applied into movie recommendatation system in experiments and its effieciency was confirmed.2) For research scene of semi-supervised classification, from the view of the false labels being easily propagated because of the labels of the labeled data being partially attacked during the process of self-labeling in traditional semi-supervised classification algorithms, the dissertation proposed a semi-supervised support vector machine model which was based on the principle of the extended hidden features from the angel of features. The proposed algorithm firstly augmented the original feature of the data by an orthonormal row transformation based on the principle of the minimal integrated squared errors between the probability distribution of the labeled and the unlabeled data; subsequently, it trained the final classification machine in the extended feature space based on the large margine of support vector machine. The related experimental results confirmed the efficiency of the proposed algorithm.3) For research scene of semi-supervised classification, according to the semi-supervised support vector machine classification algorithm based on the principle of the extended hidden features, considering of improvement in running time and safety of using the unlabeled data, the dissertation proposed a novel semi-supervised classification algorithm based on oversampling technology and common latent space. The proposed algorithm firstly generated synthetic data from the labeled and the unlabeled data by oversampling technology, and then found out the common latent space between the original labeled data and the synthetic data based on the minimal integerated squared errors between the probability distribution of them; lastly it trained the labeled data in the feature space composed of the original feature space and the common latent space. The related experimental results confirmed the efficiency of the proposed algorithm.4) For research scene of transfer classification, in order to fully exploit the common knowledge between different but related fields, the dissertation proposed a novel feature-based transfer classification algorithm from the angel of feature transformation. The proposed algorithm fully considered the constraints of the original feature space and the commen low dimensionality hidden space between domains. Specifically, the proposed algorithm firstly introduced a feature transformation matrix parameter as the common knowledge between domains and then it projected the data from different domains into a common low dimensionality feature space. Subsequently, it constructed the combined decision function based on the orginal feature space and the the common latent space; lastly, it embedded the original and the low dimensionality feature space into the training of the support vector machine and obtained an efficient classification machine for the target domain. Related experimental results confirmed the effieciency of the proposed algorithm.5) For research scene of transfer classification, in order to fully exploit the common knowledge between different but related fields and avoid negative transfer, the dissertation assumed that some common hidden feature existed between different domains from the angel of data attributes and proposed a transfer common feature support vector machine algorithm. The proposed algorithm firstly contructs the common feature space between the source and the target domain by minimizing the jonint probability distribution of the unlabeled data from the target domain and the labeled data from the source domain; secondly fully considering the distributions of the labeled data and potential attack from the labeles of the labeled data, it trained the labeled data from the source domain in the extended feature space composed of the orginal and the common feature space and obtained the final classification machine. Related experimental results also confirmed the efficiency of the proposed transfer classification machine.
Keywords/Search Tags:Recommendation System, Minimum Enclosing Ball, Pattern Classification, Semi-supervised learning, Transfer learning, Common latent space, Support vector machine
PDF Full Text Request
Related items