Font Size: a A A

Research On Partially Supervised Classification

Posted on:2015-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:T KeFull Text:PDF
GTID:1268330428961726Subject:Strategy and management
Abstract/Summary:PDF Full Text Request
Traditional machine learning techniques need to use a large number of labeled data points for training. However, in many real world applications, it is very difficult for us to obtain a large amount of labeled data points. It is costly to obtain labeled data points since human labeling is a labor-intensive and time consuming process and has to rely on the efforts of a small number of domain experts and knowledge engineers. While a large amount of unlabeled data points may be relatively easily and cheaply available. So how to use a large amount of unlabeled data points and a small number of labeled data points to learning has roused considerable interests in the field of pattern recognition and machine learning in recent years, especially semi-supervised binary classification and PU learning. The goal of semi-supervised binary classification is to construct a classifier or a score function using two categories (positive and negative) of examples and unlabeled examples, whearas the objectives of PU learning is to design a robust and principled technique using only positive examples and unlabeled examples. The machine learning problem we research in this paper is semi supervised classification and PU learning, which is called partially supervised classification in short. Compared with semi-supervised binary classification, the key feature of PU Learning is that there is no labeled negative training data, which makes the traditional classification techniques inapplicable. Thus, we have to find another new way to solve PU learning. In addition, there is no common method to solve semi-supervised classification and PU learning simultaneously at present. Thus, we put forward a variety of learning models and algorithms for partially supervised classification. More concretely, this paper contains the following contents:1. A novel support vector machine classifier based on multi-level reliability (MLR-SVM) for PU learning is constructed, which is just regarded as the improvement of Biased-SVM. More specifically, we construct an SVM classifier by introducing an extra parameter to weight the error of the reliable positive examples from unlabeled examples. In addition, a heuristic algorithm is adopted for MLR-SVM in this paper. Experiments on two real applications, text classification and bioinformatics classification show that MLR-SVM is more effective than Biased-SVM and other popular two-step methods, such as ROC-SVM and S-EM.2. We have put forward a biased least squares support vector machine (Biased-LSSVM) and biased proximal support vector machine for PU learning. The idea of these approaches is similar to that of Biased-SVM. Compared with Biased-SVM. the proposed classifiers have three advantages. First. Biased-LSSVM and Biased-PS VM can reflect the class labels of all examples more sufficiently and accurately than Biased-SVM. Second. Biased-LSSVM and Biased-PSVM are more stable than Biased-SVM. Finally。 the time complexity of Biased-LSSVM and Biased-PSVM is lower than that of Biased-SVM. where Biased-LSSVM and Biased-PSVM only need to solve liner equations and Biased-SVM is a quadratic programming. The Experiments on two real applications, text classification and bioinformatics classification verify the above opinions and show that Biased-LSSVM and Biased-PSVM are more effective than Biased-SVM and other popular methods.3. We investigate the transductive partially supervised classification, where it contains transductive semi-supervised classificaition and transductive PU learning. At present, graph-based approaches are the most classical and popular for transductive semi-supervised classification. Their core idea is that similar examples should access similar labels. On the contrary, there is another relationship of examples that is dissimilarity. Different from the currently existed graph-based methods, our models incorporate an extra term named dissimilar term, which makes dissimilar examples possess different labels. The dissimilar term not only reflects the geometric structure of both labeled and unlabeled examples from a new point of view but also turns transductive PU learning into transductive semi-supervised binary classification. Experiment results on several real-world datasets show that our two models are more effective than classical techniques.
Keywords/Search Tags:semi-supervised classification, positive and unlabeled learning, Support VectorMachine (SVM), text classification, bioinformatics
PDF Full Text Request
Related items