The Study Of Active Learning Methods Based On Uncertainty And Margin Sampling

Posted on:2016-02-29

Degree:Master

Type:Thesis

Country:China

Candidate:Zhou

Full Text:PDF

GTID:2308330461972733

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Serving as one of the most important research directions in machine learning, active learning has begun to catch more and more attention in recent years. In this paper, we will study two classical active learning algorithms and propose two improved methods by combining them with the manifold-preserving graph reduction algorithm (MPGR).Active learning of Gaussian processes (GPAL) is a popular active learning method which is based on uncertainty. GPAL can effectively use a small number of labeled examples to train a classifier. According to some certain sampling criterion, GPAL selects the most uncertain examples from unlabeled data for manual labeling. However, when several examples are chosen simultaneously, GPAL usually just consider the information of all the unlabeled data without exploiting the distribution and structural spatial connectivity among them. This will lead to a result that in one small area there will be more than one point to be selected. However, points in this small area are likely to offer the same information, and thus there is no need to select all of them. More importantly, it will cause data redundancy which decreases the classification accuracy.To overcome this shortcoming, we apply MPGR to the original GPAL and propose a new method GPMAL. MPGR is an efficient graph reduction algorithm based on the manifold assumption, which can select a subset representing the global manifold structure of the original data distribution. Therefore, when selecting unlabeled examples, GPMAL considers both the informativeness and representativeness criteria. Compared with the original selection criterion, the selection criterion of GPMAL is more comprehensive and reasonable. The experimental results indicate that GPMAL outperforms GPAL.Margin sampling (MS) is another common approach of active learning, which is based on support vector machines (SVM). MS tends to choose the examples nearest to the hyperplane. The nearer the distance of an example to the hyperplane is, the lower the classifier confidence on it is. Analogously, we find that when several examples are chosen simultaneously, MS has the same drawbacks as GPAL. Namely, the oversampling phenomenon on dense regions. Motivated by this, by applying MPGR to the original MS, we propose a new improved method called IMS, which combines the informativeness and representativeness criteria together. The experimental results indicates that by exploiting MPGR, the performance of IMS is also improved.Moreover, we not only longitudinally compare the classification performance of the same method before improvement and after improvement, but also compare Gaussian process and margin sampling active learning algorithm (including their improved versions) in two aspects of classification performance and computing time. In the experiments, it can be verified that Gaussian process active learning has the advantage of classification performance, while margin sampling active learning has the advantage of time cost.

Keywords/Search Tags:

Active leaning, Gaussian process, Margin sampling, Support vector machine, Manifold-preserving graph reduction

PDF Full Text Request

Related items

1	The Study Of Classification Methods And Its Applications In Web Mining Based On Statistical Learning
2	The Study Of Several Issues And Application In Statistical Pattern Recognition
3	Study Of Support Vector Machine Classification And Hash Retrieval Methodsbased On Structure Preserving Property
4	Research And Simulation Of Highway License Plate Recognition Method Based On Support Vector Machine Technology
5	Research On Structure Support Vector Machine Classification Models
6	Semi-supervised Twin Support Vector Machine
7	Research On Support Vector Machine Based On Radius Margin Bound
8	A Research On Dimensionality Reduction And Classification Of Hyperspectral Image Based On Support Vector Machine
9	Sample Selection And Fuzzy Rough Set Based Soft Margin Support Vector Machine
10	Some Algorithms Research On Support Vector Machines