Font Size: a A A

Research On Text Summarization Generation Technology Based On Neural Network

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y F NieFull Text:PDF
GTID:2518306107468044Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In daily life,people read more and more,but get less and less effective information,which essentially stems from information overload.Text summarization technology can help readers quickly understand the main purpose of the article and reduce reading costs.The text summarization is mainly divided into two categories: extractive summarization and abstractive summarization.The extractive summarization methods extract sentences from the original text as abstracts,e.g.,the Text Rank model.The abstractive summarization methods mainly rely on the Seq2 Seq architecture to output new sentences as abstracts,which are highly inductive.However,the classic Seq2 Seq model has defects such as insufficient semantic understanding of the encoder,disorder of abstract word order,and out-of-vocabulary words problems.In response to the above problems,this thesis proposes a Seq2 Seq model based on C-RNN encoder,Char-Based word vectors input and Attention mechanism.The main contents of this thesis include:1.To improve the problem of insufficient semantic understanding of the encoder,this thesis proposes C-RNN encoder.RNN is good at learning long-distance dependence,but not good at capturing local semantic features,and CNN is the opposite.Proposed in this thesis,the C-RNN encoder combines the advantages of the two to enhance the semantic understanding of the encoding and improve the quality of the abstract.2.To tackle the problem of word order disorder,this thesis introduces the Attention mechanism at the decoding end,which can rely on its soft alignment feature to improve the word order disorder problem and optimize the abstract output.3.To cope with the problem of out-of-vocabulary words,this thesis proposes the Char-Based word vectors input,whose dictionary is small and there is almost no problem of out-of-vocabulary words.Combined with Char-Based word vectors,the C-RNN encoder can enrich the expression of word vectors,which can implement one character multi-vector,enhance semantic understanding,and alleviate the problem of out-of-vocabulary words.The experimental results show our proposed method has well alleviated the above problems.Our proposed model achieves higher ROUGE-1,ROUGE-2,and ROUGE-L scores which are improved by 7.2%,5.9%,7.5% respectively compared with the Text Rank model and 10.6%,8.1%,9.7% respectively compared with the classic Seq2 Seq model.In order to verify the generalization and practicability of our proposed model,this thesis designs and implements a text parsing system to perform cross-dataset testing and timeliness evaluation on our model.The test results show that the model proposed in this thesis is highly generalized and practical.
Keywords/Search Tags:Text summarization, Seq2Seq, C-RNN encoder, Out-of-vocabulary words, Attention mechanism
PDF Full Text Request
Related items