Font Size: a A A

Research On The Method Of Selecting Samples For Pool Model In Active Learning

Posted on:2021-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:2428330605979314Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Active learning solves the problem that supervised learning need a large number of training samples,the core is the problem of making the strategy of selecting samples,which achieves the goal of model convergency quickly.With the pool model to select samples,due to selecting samples by batch,the problem of information redundancy maybe existed between samples which lead to the efficiency of active learning is reducing.By researching on the problem of information redundancy,the problem mainly exists in the samples set to be labeled and between the sample set to be labeled and the labeled sample set.On the condition that neural network have MLP(Multi-Layer Perception),the matrix of information redundancy is defined,according to that,the optimization of DRAL(Discrimination And Redundancy Active Learning)and LDRAL(Labeled Discriminative And Redundancy Active Learning)are composed.For the problem of information redundancy which existed in the samples set to be labeled,the optimization of DRAL is proposed,candidate set which is consisted of many unlabeled samples is selected with original method,the initial sample set to be labeled is consisted by selecting samples that are most similar to candidate set,then selecting samples that is the most dissimilar to the samples set to be labeled into itself from the candidate set in each time.For the problem of information redundancy which existed between the samples set to be labeled and the labeled samples set,the optimization of LDRAL is proposed,With the number of iterations increasing,the number of samples in the labeled samples set is more and more large,the cost of computation exceeds the hardware limit.An uncertainty threshold is defined to select uncertain labeled samples set which replace labeled samples set.Selecting samples from candidate set as the samples set to be labeled which is most dissimilar to uncertain labeled samples set.On the Mnist,Fashion-mnist and Cifar-10 datasets with the above two methods,at the same accuracy,using the uncertainty reduction method to select samples,the DRAL method can reduce the number of labeled samples by at least 11%,and the LDRAL method can reduce at least 8.3%,which can optimize the problem of information redundancy effectively.
Keywords/Search Tags:Selecting sample for pool model, The problem of information redundancy, Samples set to be labeled, Uncertain labeled samples set, Uncertainty reduction method
PDF Full Text Request
Related items