Font Size: a A A

The Study Of Active Learning Methods Based On Uncertainty And Margin Sampling

Posted on:2016-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:ZhouFull Text:PDF
GTID:2308330461972733Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Serving as one of the most important research directions in machine learning, active learning has begun to catch more and more attention in recent years. In this paper, we will study two classical active learning algorithms and propose two improved methods by combining them with the manifold-preserving graph reduction algorithm (MPGR).Active learning of Gaussian processes (GPAL) is a popular active learning method which is based on uncertainty. GPAL can effectively use a small number of labeled examples to train a classifier. According to some certain sampling criterion, GPAL selects the most uncertain examples from unlabeled data for manual labeling. However, when several examples are chosen simultaneously, GPAL usually just consider the information of all the unlabeled data without exploiting the distribution and structural spatial connectivity among them. This will lead to a result that in one small area there will be more than one point to be selected. However, points in this small area are likely to offer the same information, and thus there is no need to select all of them. More importantly, it will cause data redundancy which decreases the classification accuracy.To overcome this shortcoming, we apply MPGR to the original GPAL and propose a new method GPMAL. MPGR is an efficient graph reduction algorithm based on the manifold assumption, which can select a subset representing the global manifold structure of the original data distribution. Therefore, when selecting unlabeled examples, GPMAL considers both the informativeness and representativeness criteria. Compared with the original selection criterion, the selection criterion of GPMAL is more comprehensive and reasonable. The experimental results indicate that GPMAL outperforms GPAL.Margin sampling (MS) is another common approach of active learning, which is based on support vector machines (SVM). MS tends to choose the examples nearest to the hyperplane. The nearer the distance of an example to the hyperplane is, the lower the classifier confidence on it is. Analogously, we find that when several examples are chosen simultaneously, MS has the same drawbacks as GPAL. Namely, the oversampling phenomenon on dense regions. Motivated by this, by applying MPGR to the original MS, we propose a new improved method called IMS, which combines the informativeness and representativeness criteria together. The experimental results indicates that by exploiting MPGR, the performance of IMS is also improved.Moreover, we not only longitudinally compare the classification performance of the same method before improvement and after improvement, but also compare Gaussian process and margin sampling active learning algorithm (including their improved versions) in two aspects of classification performance and computing time. In the experiments, it can be verified that Gaussian process active learning has the advantage of classification performance, while margin sampling active learning has the advantage of time cost.
Keywords/Search Tags:Active leaning, Gaussian process, Margin sampling, Support vector machine, Manifold-preserving graph reduction
PDF Full Text Request
Related items