Research On Text Classification Based On Self-attention Encoder Model

Posted on:2021-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Yang

Full Text:PDF

GTID:2428330626960372

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Text classification is a basic task in many natural language processing applications.Faced with the explosive increase in text resources on the Internet,how to use text classification technology to manage and classify these resources in a reasonable and efficient manner,and to tap the potential commercial value of them,has become increasingly important.In recent years,deep learning has made tremendous progress in feature extraction and representation,and has achieved satisfactory results in many fields of natural language processing.In particular,the Transformer model,which is completely different from the previous model structure,has achieved dazzling achievements in the field of machine translation,which causes considerable repercussions.The Transformer model uses a self-attention mechanism to solve the long-term dependence of language representation,and increases the depth of the model through the residual structure to improve the language expression ability of the model on large-scale data.However,the Transformer model still has many problems in the text classification task.This paper studies from two aspects of model structure improvement and model distillation.In terms of model structure improvement,Transformer as an encoding-decoding model performs well in machine translation tasks.However,it has been found in actual experiments that when Transformer is used as an encoder directly for text classification tasks with small sample data,the Transformer model is too complicated The model is easy to over-fit.Therefore,compared with the traditional shallow neural network RNN,its experimental effect is not ideal.To solve this problem,with the help of Transformer coding blocks composed of Transformer components,this paper proposes a "hermit crab" strategy,which replaces the self-attention mechanism in Transformer with a bidirectional RNN.The RNN is sequentially integrated with each component of Transformer to generate a new experimental model.Experimental results show that the classification accuracy of the model in the task is improved.The improvement mainly comes from the multi-headed attention mechanism and the multi-layered attention mechanism.In terms of model distillation,for the deep language representation model BERT with Transformer as the basic structure,the model is complex,the calculation cost is large,and the reasoning speed is slow,this paper builds a multi-domain adaptive knowledge distillation framework.Combine the fine-tuned BERT models in multiple fields as a teacher model.With the help of the teacher model,through the use of soft tags and hard tags across different fields,training a single student model that is suitable for multiple fields.The distillation goals under this framework include: word embedding layer distillation,coding layer distillation(attention distillation,hidden state distillation),output prediction layer distillation,etc.Experimental results verify that knowledge distillation can effectively transfer the generalization ability of the teacher model to the student model.Multi-field distillation can make the student model more versatile,and the classification accuracy of the model on three tasks is further improved.By conducting research on the Transformer model in the above two aspects,we have fully tapped the Transformer's ability in the text classification task,and improved the performance of the current deep model in the text classification task.

Keywords/Search Tags:

RNN, Transformer, BERT, Knowledge distillation, Text Classification

PDF Full Text Request

Related items

1	Algorithm And Application Of Text Classification Based On Transformer
2	Research On Chinese Text Sentiment Analysis Based On Transformer And BERT Model
3	Design And Implementation Of News Classification System Based On Knowledge Distillation
4	Research On Lightweight Traffic Classification Based On Knowledge Distillation
5	Research On Sentiment Analysis Of Chinese Text Based On Transformer
6	Scene Text Recognition Based On Attention Mechanism And Knowledge Distillation
7	Short Text Classification Based On The Model Of Knowledge Graph And Word Combination
8	Design Of User Review Supervision System Based On Deep Learning
9	Reaserch On Audio Classification Based On Knowledge Distillation
10	Research And Application Of Lightweight Neural Network Based On Knowledge Distillation