Font Size: a A A

Improved Label Noise Filtering Method Based On Active Learning

Posted on:2022-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2518306521481644Subject:Economic big data analysis
Abstract/Summary:PDF Full Text Request
With the development of machine learning technology,the demand for data annotation and collection has surged.However,due to the lack of experience of the markers,limited professionalism,and subjective marking tasks,data labels obtained through crowdsourcing platforms are often doped with a lot of noise.In this context,this paper proposes a label noise filtering algorithm based on the idea of active learning.The purpose of this method is to obtain a data set with low noise rate but have good model training effect through low-cost processing.It mainly includes two steps : "noise sample recognition and filtering" and "active learning to screen important samples".It can continuously improve the quality of the data set through iteration.Compared with other methods that directly delete noisy samples,this method can effectively prevent the loss of important information,and is better than the existing iterative filtering method that integrates active learning ideas in terms of time performance.In addition,what this article actually proposes is a new iterative framework in which the algorithms can be flexibly combined to achieve the desired effect.In order to verify the effectiveness of the method proposed in this paper,an empirical analysis was carried out on the real data set of the public database,and evaluation indicators were designed to evaluate the experimental results from the three aspects of label quality,model performance and inspection cost.Experimental results show that when the data set has a large sample size and high noise level,this method exhibits good performance.
Keywords/Search Tags:label noise, active learning, Iterative filtering
PDF Full Text Request
Related items