Font Size: a A A

Study Of Stopping Criteria And Performance Evaluation Metrics In Active Learning

Posted on:2017-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2348330503968098Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Active learning, which is one of the hot-spots in the field of machine learning, aims to minimize the amount of human labeling effort required for a supervised classifier to achieve a satisfactory performance. In active learning applications, the design of stopping criterion is very important and practical issue due to it makes little sense to continue the active learning procedure until all unlabeled samples has been labeled. In addition, as we know, we always need to pre-define some metrics for evaluating the performance of a given active learning algorithm. The issue has been neglected in previous work referring to active learning. According to what mentioned above, this thesis mainly focus on the research about two issues: stopping criteria and performance evaluation metrics of active learning.First, the thesis presents several simple stopping criteria over the unlabeled data pool for active learning. To solve the problem that selected accuracy stopping criterion can be only applied in the scenario of batch mode-based active learning, an improved stopping criterion applying for single-labeling mode was proposed. The matching relationship between each predicted label and the corresponding real label existing in a pre-designed number of learning rounds are used to approximately estimate and calculate the selected accuracy. The higher the match quality is, the higher the selected accuracy is. Then, the variety of selected accuracy can be monitoring by moving a sliding-time window. Active learning will stop when the selected accuracy is higher than a pre-designed threshold. The experiments are conducted on 6 baseline data sets with active learning algorithm based on support vector machine(SVM) classifier, indicating the effectiveness and feasibility of the proposed criterion. The results show that when pre-designing an appropriate threshold, active learning can stop at the right time. The proposed method expands the applications of selected accuracy stopping criterion.At present, there are various active learning algorithms, they share a common performance evaluation measure, i.e., learning curve. As learning curve can present the variance of the quality of the classification model during the whole iterative learning procedure, most research articles about active learning directly use learning curves to compare the performance of different algorithms. However, sometimes the learning curve can not directly present the slight difference between two similar active learning algorithms. In order to solve the problem, The thesis present four quantitative performance evaluation metrics by investigating the latent information behind the learning curve, namely area under the learning curve(ALC), logarithmic area under the learning curve(LALC), average gradient angle(AGA) and logarithmic average gradient angle(LAGA), respectively. In particular, all four metrics can present impartial evaluation results for active learning algorithms based on the homogeneous classifier, but when the qualities of different active learning algorithms based on the heterogeneous classifiers need to be compared, AGA and LAGA would be more suitable than two others. In addition, LALC and LAGA focus more on the learning speed at the initial stage of the learning procedure than two others. Experimental results on 9 data sets and multiple baseline active learning algorithms indicate the effectiveness of these four metrics.
Keywords/Search Tags:active learning, stopping criterion, learning curves, performance evaluation metrics
PDF Full Text Request
Related items