Study Of Stopping Criteria And Performance Evaluation Metrics In Active Learning

Posted on:2017-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2348330503968098

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Active learning, which is one of the hot-spots in the field of machine learning, aims to minimize the amount of human labeling effort required for a supervised classifier to achieve a satisfactory performance. In active learning applications, the design of stopping criterion is very important and practical issue due to it makes little sense to continue the active learning procedure until all unlabeled samples has been labeled. In addition, as we know, we always need to pre-define some metrics for evaluating the performance of a given active learning algorithm. The issue has been neglected in previous work referring to active learning. According to what mentioned above, this thesis mainly focus on the research about two issues: stopping criteria and performance evaluation metrics of active learning.First, the thesis presents several simple stopping criteria over the unlabeled data pool for active learning. To solve the problem that selected accuracy stopping criterion can be only applied in the scenario of batch mode-based active learning, an improved stopping criterion applying for single-labeling mode was proposed. The matching relationship between each predicted label and the corresponding real label existing in a pre-designed number of learning rounds are used to approximately estimate and calculate the selected accuracy. The higher the match quality is, the higher the selected accuracy is. Then, the variety of selected accuracy can be monitoring by moving a sliding-time window. Active learning will stop when the selected accuracy is higher than a pre-designed threshold. The experiments are conducted on 6 baseline data sets with active learning algorithm based on support vector machine(SVM) classifier, indicating the effectiveness and feasibility of the proposed criterion. The results show that when pre-designing an appropriate threshold, active learning can stop at the right time. The proposed method expands the applications of selected accuracy stopping criterion.At present, there are various active learning algorithms, they share a common performance evaluation measure, i.e., learning curve. As learning curve can present the variance of the quality of the classification model during the whole iterative learning procedure, most research articles about active learning directly use learning curves to compare the performance of different algorithms. However, sometimes the learning curve can not directly present the slight difference between two similar active learning algorithms. In order to solve the problem, The thesis present four quantitative performance evaluation metrics by investigating the latent information behind the learning curve, namely area under the learning curve(ALC), logarithmic area under the learning curve(LALC), average gradient angle(AGA) and logarithmic average gradient angle(LAGA), respectively. In particular, all four metrics can present impartial evaluation results for active learning algorithms based on the homogeneous classifier, but when the qualities of different active learning algorithms based on the heterogeneous classifiers need to be compared, AGA and LAGA would be more suitable than two others. In addition, LALC and LAGA focus more on the learning speed at the initial stage of the learning procedure than two others. Experimental results on 9 data sets and multiple baseline active learning algorithms indicate the effectiveness of these four metrics.

Keywords/Search Tags:

active learning, stopping criterion, learning curves, performance evaluation metrics

PDF Full Text Request

Related items

1	Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions
2	An empirical study of performance metrics for classifier evaluation in machine learning
3	Study On Active Learning Methods For Empirical Performance Modeling Of High Performance Computing Programs
4	A Variance Minimization Criterion For Manifold Learning
5	On Several Issues Of Label Distribution Learning
6	A Study On High Speed P300-Based Brain-Computer Interface Speller
7	Identifying Relevant Interaction Metrics for Predicting Student Performance in a Generic Learning Content Management System
8	Research On The Stopping Criteria For High Speed Turbo Decoders
9	Evaluation, Degree Reduction And Offset Approximation Of Curves And Surfaces
10	Research On Translation Rules And Translation Quality Evaluation Based On Deep Learning