With the rapid development of Internet technology,the ways people access data have become very diverse and easy.However,due to the diversity and complexity of acquired data,the quality of data becomes difficult to ensure,and the collected data often contains noise.According to the location of the noise,it can be divided into feature noise and label noise.The former can already be well detected and processed,but the research on the latter is slightly insufficient.Therefore,currently,the processing of label noise is more important.The processing of label noise is also known as Label Noise Learning(LNL).The purpose is to reduce the impact of label noise on model establishment and decision-making.This article aims at the classification tasks in label noise learning by analyzing the distribution differences of different data samples in the feature space and the loss value differences predicted by the model,and then reclassifies the data samples,A systematic study has been conducted on the data level and model level of tag noise learning.The specific work is summarized as follows:(1)Aiming at the data level of label noise learning,a label noise filtering framework based on anomaly detection is proposed,and on this framework,a label noise filtering algorithm Ad NN(Label Noise Filtering via Adaptive Nearest Neighbor Clustering)is proposed.Although there are many algorithms for filtering tag noise,most methods do not consider the effectiveness of filtering,resulting in filtering out too many clean samples,resulting in severe over cleaning.Compared with most existing methods,the algorithm proposed in this paper converts tag noise detection into outlier detection by analyzing the similarities and differences between outliers and tag noise,and then identifies tag noise from outliers through relative density and noise factors.This multi-layer filtering method can effectively reduce the impact of tag noise,reduce excessive cleaning,and improve the effectiveness of tag noise filtering.(2)Aiming at the model level of tag noise learning,a novel tag noise antagonism training algorithm CNLAS(Combining Noise Labels by An Adversarian Training Method with Anomalous Samples)is proposed.Currently,there are many algorithms for robust models of label noise,most of which utilize the small loss characteristic of neural networks to directly treat small loss samples as clean sample training networks,in order to achieve the goal of being robust to label noise.However,during actual training,small loss samples also contain tag noise samples,and large loss samples also contain clean samples.To solve this problem,the algorithm proposed in this paper analyzes the loss changes of all samples during the entire training period,finds out that the samples with a small average loss during the entire training period are used as clean sets,and the samples with a large average loss are used as label noise sets.Then positive and negative learning is performed on the clean sets and label noise sets respectively,and a self-training method is added during the training process to improve the robustness of the model against label noise.This paper systematically studies the classification tasks in tag noise learning,and proposes a tag noise filtering framework based on anomaly detection and a tag noise antagonism training algorithm for abnormal samples from the data level and model level,respectively.This provides a new method for the effectiveness of tag noise filtering and the design of tag noise robust models.The results obtained in this paper have certain significance and practical application value for the study of tag noise learning. |