| Text classification is a basic problem in the field of Natural Language Process(NLP).The most popular method is to train deep neural network models with a large amount of data.However,the acquisition of large amounts of data may require a lot of labor time and other costs,and the accuracy of data labels may not be guaranteed.Therefore,cheaper deep learning methods such as semisupervised learning and noisy learning have emerged.Based on the BERT(Bidirectional Encoder Representations from Transformers)model structure,this paper makes research and exploration in both semi-supervised learning and noisy learning.(1)Aiming at the problem of excessive semantic changes and noise labels in the process of text data enhancement in semi-supervised learning,this paper proposes an Adversarial Semi-Supervised Learning(ASSL)robust to noise labels for text classification.The main contributions of ASSL include improved adversarial data augmentation methods and loss functions.For adversarial data enhancement,this paper proposes a Maximum Hidden Gradient Descent(MHGD)method,which allows the strongest adversarial perturbation to act on the specific semantic representation tensor of the Transformer hidden layer of the BERT model,which improves the consistency of the model and reduces the risk of dramatic semantic changes.For the loss function,this paper proposes FlexSymmetric Cross Entropy(Flex-SCE)by combining reverse cross entropy with cross entropy as a noise tolerance term.Flex-SCE dynamically reduces the influence of labeled data during training.Therefore,ASSL minimizes the overfitting of the model on limited labeled data(especially noise labels).Experiments show that compared with several advanced semi-supervised learning methods,ASSL achieves excellent training performance on multiple datasets and significantly reduces the impact of noise labels.(2)For small sample noisy text classification tasks,this paper proposes an improved small sample noisy learning method(Few CL)based on Confident Learning(CL).Confident learning is suitable for data filtering and noise reduction processing of large-scale data sets.However,when the number of samples is small,there is still room for improvement in the method of relying on the mean value of the model prediction probability of each category as the threshold to make a judgment.On the one hand,under-fitting models may remove difficult samples in the data set too arbitrarily,and even lead to too little data or imbalance of types;on the other hand,the model affected by the wrong label in the training set is not necessarily reliable in judging the correctness of the label.Different from belief learning,which uses model prediction probability to judge whether the label is correct or not,this method uses BERT to encode the semantics of sentences,and maps each sentence to a highdimensional space to obtain a semantic expression tensor and a ’prototype’ tensor for each category.Based on the idea of clustering,this method calculates the ’distance’ of each semantic expression tensor to the ’prototype’ of the category.Then,based on the ’ distance ’ and its mean and standard deviation,the sample labels are subdivided into three categories: error labels,correct difficult labels and correct simple labels.After filtering out the wrong label samples,the label confidence is measured according to the ’distance’,and the remaining samples are given weights and participate in the subsequent model training.Experiments show that compared with traditional supervised training,Symmetric Cross Entropy and Confident learning,this method shows the best model training effect under multiple data sets and multiple label error rates. |