Font Size: a A A

Algorithm And Application Of Text Classification Based On Transformer

Posted on:2022-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:C Q SunFull Text:PDF
GTID:2518306608490204Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet and the surge in the number of network texts,the demand for text classification technology has become increasingly significant.With the rapid development of text classification technology,Pre-training models based on Transformer: BERT and RoBERTa,have achieved satisfactory results in text classification and many natural language processing tasks,but they also have the following shortcomings: 1)Transformer has some limitations on the input length,truncating the longer input directly leads to the loss of information,and batch training makes the [CLS] participle contain redundant information,which affects the classification accuracy as a classification feature;2)the large number of parameters of the pre-training model puts forward higher requirements for the computing power of the hardware,and the reasoning time of the model is long;3)the model usually needs to use a large number of labeled data to achieve good results.In view of the above problems and challenges faced by the model application of text classification,the main research work is as follows:(1)Aiming at the problem that rough interception leads to the loss of text information and the redundant information contained in [CLS] word segmentation affects the classification accuracy: The method of head-to-tail truncation and pooling of mean and maximum values is put forward.The beginning and end of the text contain richer semantic and emotional information,and the maximum value pool filters the redundant information between the participles to get the most important text feature information of the input sequence,and the mean pool avoids the loss of the whole text information.The use of head-to-tail truncation and pooling of mean and maximum values improves the performance of BERT in text classification tasks.(2)Aiming at the problem that too many model parameters lead to long reasoning time:A BERT model for classification task distillation is proposed.For the improved BERT model based on classification task,the knowledge distillation method is used to distill the large model into small model,which is applied to the text classification task.The small model obtained by distillation of the improved BERT model also significantly improves the reasoning speed of the model.(3)Aiming at the problem that the model needs a large amount of tagged data: The RoBERTa-GAN model is proposed,RoBERTa and GAN(Generative Adversarial Networks)are fused to complete semi-supervised learning,the generator generates false samples similar to the real data distribution,obtains high-quality text representation ability through RoBERTa,and then a large amount of untagged data is used to help the model learn the boundary of the spatial distribution of samples and complete the generalization of the final task.Semisupervised learning is realized by using GAN on the RoBERTa model,which reduces the need for tagged data,expands the ability of the model to fine-tune untagged data,and improves the performance of text classification tasks.
Keywords/Search Tags:Text classification, Transformer, BERT, RoBERTa, Model compression, Generative Adversarial Networks, Semi-supervised learning
PDF Full Text Request
Related items