Font Size: a A A

Research On Text Classification Based On Self-attention Encoder Model

Posted on:2021-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:X Y YangFull Text:PDF
GTID:2428330626960372Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text classification is a basic task in many natural language processing applications.Faced with the explosive increase in text resources on the Internet,how to use text classification technology to manage and classify these resources in a reasonable and efficient manner,and to tap the potential commercial value of them,has become increasingly important.In recent years,deep learning has made tremendous progress in feature extraction and representation,and has achieved satisfactory results in many fields of natural language processing.In particular,the Transformer model,which is completely different from the previous model structure,has achieved dazzling achievements in the field of machine translation,which causes considerable repercussions.The Transformer model uses a self-attention mechanism to solve the long-term dependence of language representation,and increases the depth of the model through the residual structure to improve the language expression ability of the model on large-scale data.However,the Transformer model still has many problems in the text classification task.This paper studies from two aspects of model structure improvement and model distillation.In terms of model structure improvement,Transformer as an encoding-decoding model performs well in machine translation tasks.However,it has been found in actual experiments that when Transformer is used as an encoder directly for text classification tasks with small sample data,the Transformer model is too complicated The model is easy to over-fit.Therefore,compared with the traditional shallow neural network RNN,its experimental effect is not ideal.To solve this problem,with the help of Transformer coding blocks composed of Transformer components,this paper proposes a "hermit crab" strategy,which replaces the self-attention mechanism in Transformer with a bidirectional RNN.The RNN is sequentially integrated with each component of Transformer to generate a new experimental model.Experimental results show that the classification accuracy of the model in the task is improved.The improvement mainly comes from the multi-headed attention mechanism and the multi-layered attention mechanism.In terms of model distillation,for the deep language representation model BERT with Transformer as the basic structure,the model is complex,the calculation cost is large,and the reasoning speed is slow,this paper builds a multi-domain adaptive knowledge distillation framework.Combine the fine-tuned BERT models in multiple fields as a teacher model.With the help of the teacher model,through the use of soft tags and hard tags across different fields,training a single student model that is suitable for multiple fields.The distillation goals under this framework include: word embedding layer distillation,coding layer distillation(attention distillation,hidden state distillation),output prediction layer distillation,etc.Experimental results verify that knowledge distillation can effectively transfer the generalization ability of the teacher model to the student model.Multi-field distillation can make the student model more versatile,and the classification accuracy of the model on three tasks is further improved.By conducting research on the Transformer model in the above two aspects,we have fully tapped the Transformer's ability in the text classification task,and improved the performance of the current deep model in the text classification task.
Keywords/Search Tags:RNN, Transformer, BERT, Knowledge distillation, Text Classification
PDF Full Text Request
Related items