Font Size: a A A

Research On Unbalanced Text Classification Based On Text Augmentation And Semi-Supervised Learning

Posted on:2023-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhouFull Text:PDF
GTID:2558307070983299Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,as the unbalanced text classification task in natural language processing has received extensive attention,many researches have been devoted to solving this problem.On the one hand,scholars use text augmentation algorithms on unbalanced datasets to add more training samples and inject richer information for the model,which can improve the performance and generalization ability of the model.On the other hand,since supervised learning requires a large amount of labeled data,researchers also try to use semi-supervised learning to optimize the unbalanced text classification task to better solve the problem of difficult acquisition of labeled data in practical scenarios.Based on the above two points,this thesis conducts research from two aspects of text augmentation and semi-supervised learning to solve the problem of unbalanced text classification.The main work is as follows:(1)In this thesis,four limited text augmentation(LTA)algorithm is proposed for unbalanced text classification.An algorithm is designed for the mining of critical data,and a method for automatic mining of characteristic words is implemented,which will jointly assist the completion of LTA.(2)In this thesis,a semi-supervised network model LTA-SSL for unbalanced text classification is proposed.This method combines LTA technology and utilizes the characteristics of semi-supervised learning to further optimize the solution to the problem of unbalanced text classification.(3)In order to verify the effectiveness of the proposed text augmentation technique,multiple public datasets are selected for comparative experiments with existing text enhancement methods,and ablation experiments are designed to verify the rationality of the selected parameters.At the same time,in order to verify the effectiveness of the proposed semi-supervised model,several existing semi-supervised text classification networks are used for comparative experiments.Compared with the original dataset,the experimental results show that the use of LTA technology can increase the accuracy of the model by an average of 2.13%,which is better than 1.08% of RTA.At the same time,when the amount of labeled data is severely limited,LTA-SSL has improved experimental results compared to the existing semi-supervised models.When the number of labeled samples is sufficient,the maximum gap between LTA-SSL network and supervised learning network in accuracy is only 4.9%.
Keywords/Search Tags:Natural Language Processing, Unbalanced Text, Text Augmentation, Semi-Supervised Learning
PDF Full Text Request
Related items