Research On Sequence Modeling And Lightweight Based On Attention Mechanism

Posted on:2022-03-19

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Xie

Full Text:PDF

GTID:2518306566977829

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,it is difficult to analyze the data item by item manually,which makes it urgent for people to analyze and classify the information in an intelligent way.Sequence data(such as text,time series data)is a pivotal member of big data,whose intelligent analysis and classification modeling research is of great significance.With the development of deep neural network,the model accuracy of sequence problem has been remarkably improved,but the long-distance dependency problem of super long sequence has not been perfectly solved.In recent years,neural networks based on attention mechanism can effectively solve the problem of long-distance dependencies by focusing on some features in the sequence and reducing the noise caused by useless features.However,while solving the problem of long-distance dependencies and pursuing the high-precision model,the parameter scale of neural network model expands rapidly.Neural network models with hundreds of millions of parameters are everywhere,which brings great pressure to the commercial application of the model.This paper is aimed at the research on the many to one model of sequential tasks.Firstly,a global attention classifier(GAC)based on start tags is designed.By inserting the start tags into the sequence,the classifier can effectively capture the long-distance dependencies and get the model better convergence effect by calculating the attention representation of different classification by using the start tags.Secondly,Bidirectional Long Short-Term Memory(Bi-LSTM)and Only-Encoder Transformer(OE-Transformer)models are improved by GAC,and two emotion analysis tasks with different difficulty are designed to compare the performance differences between the classification model without attention mechanism and that with attention mechanism.The experiments shows that the model with attention mechanism has better performance on the difficult task,and the OE-Transformer with GAC performs best.Finally,this paper also optimizes the knowledge distillation process,taking Bi-LSTM and OE-Transformer integrated with GAC as the student network and the teacher network respectively,and using the trained teacher network to relabel and package the data set,so that the distillation training process only needs the participation of the student network,which greatly reduces the hardware pressure.Experiments show that the proper use of knowledge distillation technology can effectively solve the long-distance dependencies and improve the accuracy of the model,while keeping the scale of model parameters at a low level.

Keywords/Search Tags:

Long-distance Dependency, Attention Mechanism, Knowledge Distillation, Transformer, Sentiment Analysis

PDF Full Text Request

Related items

1	Research On Sentiment Analysis Of Chinese Text Based On Transformer
2	Research On Sentiment Analysis Method Of Commodity Reviews Based On Deep Learning
3	Research On Sentiment Analysis Method Based On Transformer
4	Research On Fine-Grained Sentiment Analysis For Product Reviews
5	Research On Text Sentiment Analysis Algorithm Based On Deep Learning
6	Research On Aspect-level Sentiment Classification Based On Dependency Tree And Attention Network
7	Design And Implementation Of An Epidemic Sentiment Analysis System Based On Deep Learning
8	Research On Target-based Sentiment Analysis Algorithm Based On Self-attention Mechanism
9	The Research Of Text Sentiment Analysis Technology Based On Linguistic Knowledge
10	Research On OOV And Long Distance Dependency Of Chinese Abstract Summarization Model