Font Size: a A A

Research On Sequence Modeling And Lightweight Based On Attention Mechanism

Posted on:2022-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y XieFull Text:PDF
GTID:2518306566977829Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,it is difficult to analyze the data item by item manually,which makes it urgent for people to analyze and classify the information in an intelligent way.Sequence data(such as text,time series data)is a pivotal member of big data,whose intelligent analysis and classification modeling research is of great significance.With the development of deep neural network,the model accuracy of sequence problem has been remarkably improved,but the long-distance dependency problem of super long sequence has not been perfectly solved.In recent years,neural networks based on attention mechanism can effectively solve the problem of long-distance dependencies by focusing on some features in the sequence and reducing the noise caused by useless features.However,while solving the problem of long-distance dependencies and pursuing the high-precision model,the parameter scale of neural network model expands rapidly.Neural network models with hundreds of millions of parameters are everywhere,which brings great pressure to the commercial application of the model.This paper is aimed at the research on the many to one model of sequential tasks.Firstly,a global attention classifier(GAC)based on start tags is designed.By inserting the start tags into the sequence,the classifier can effectively capture the long-distance dependencies and get the model better convergence effect by calculating the attention representation of different classification by using the start tags.Secondly,Bidirectional Long Short-Term Memory(Bi-LSTM)and Only-Encoder Transformer(OE-Transformer)models are improved by GAC,and two emotion analysis tasks with different difficulty are designed to compare the performance differences between the classification model without attention mechanism and that with attention mechanism.The experiments shows that the model with attention mechanism has better performance on the difficult task,and the OE-Transformer with GAC performs best.Finally,this paper also optimizes the knowledge distillation process,taking Bi-LSTM and OE-Transformer integrated with GAC as the student network and the teacher network respectively,and using the trained teacher network to relabel and package the data set,so that the distillation training process only needs the participation of the student network,which greatly reduces the hardware pressure.Experiments show that the proper use of knowledge distillation technology can effectively solve the long-distance dependencies and improve the accuracy of the model,while keeping the scale of model parameters at a low level.
Keywords/Search Tags:Long-distance Dependency, Attention Mechanism, Knowledge Distillation, Transformer, Sentiment Analysis
PDF Full Text Request
Related items