Text Classification Research Based On Attention Mechanism

Posted on:2020-12-18

Degree:Master

Type:Thesis

Country:China

Candidate:X C Xu

Full Text:PDF

GTID:2428330596975065

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology and the Internet,various textbased information emerges in various media such as blogs,microblogs,and news.In order to make better use of the information,text classification technology has become a hot spot of concern and research.Based on the in-depth study of attention mechanism and semantic information,this thesis proposes Self-Attention Networks,Multi-dimensional Self-Attention Networks,Semantic-based HAN model and Semantic-based SelfAttention Networks,which improve the accuracy of text classification.Firstly,this thesis deeply analyzes the text classification model HAN,and discusses various attention mechanisms.Combining two-layer sentence-text framework and selfattention mechanism with powerful information extraction ability,it proposed SelfAttention Networks(SAN)and Multi-dimensional Self-Attention Networks(MSAN);at the same time,based on HAN and SAN,the effective information contained in the lowlevel semantics of text is further analyzed.This thesis proposes a semantic-based HAN model(SHAN)and Semantic-based Self-Attention Networks(SSAN).While verifying the performance of the model,the experimental of the role of word vector initialization and sequence information was carried out.Finally,in the 20 Newsgroups English dataset,the four models achieved 75.4%,75.3%,74.8%,and 75.3% of the classification accuracy,respectively,which exceeded HAN's 74.2% and LEAM's 74.2%;in the Fudan News Chinese dataset,the four models were obtained separately 95.7%,95.8%,96.0%,95.9% of the classification accuracy rate,all exceeded HAN's 95.1% and LEAM's 95.0%.In different word vector initialization experiments,it is proved that the pre-trained word vector can greatly improve the performance of the text classification model compared with the randomly initialized word vector.In the experiment of sequence information,it is proved that adding sequence information can improve the accuracy of text classification,and the self-attention mechanism is more stable than the RNN-based network structure.

Keywords/Search Tags:

text classification, attention mechanism, word vector initialization, sequence information

PDF Full Text Request

Related items

1	Research On Attention-based Model For Sequence Classification
2	Research On Short Text Classification Based On Deep Neural Network
3	Research On Word-level Interactive Text Classification Combined With Self-attention Mechanism
4	Research And Application Of News Text Classification Based On Deep Learning
5	Research On Text Generative Summarization Method Based On Attention Mechanism
6	Text Classification Based On Label Embedding And Attention Mechanism
7	Short Text Classification Algorithm Based On Temporal Convolution And Attention Mechanism
8	Research And Implementation Of Chinese Text Classification System For Contract Domain
9	Research On Text Classification Based On Attention Bi-LSTM
10	Research And Application Of Emotion Classification Of Micro-blog On Deep Learning