Font Size: a A A

Research And Design Of News Text Summarization Based On Generative Adversarial Net

Posted on:2022-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:R HuaFull Text:PDF
GTID:2518306341982299Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
As the mobile Internet enters people’s lives,a large amount of news text information is generated.How to extract important contents from the massive news text information accurately and efficiently has become an urgent need in the industry.Automatic text summarization technology can ensure the integrity of information while reducing the volume of text,thus improving the efficiency of accessing information.In the field of generating summaries,the most used model is the sequence-to-sequence model(Seq2Seq).Due to the flexibility of text summarization and the limitation of network structure,the generated summaries are prone to word repetition,short sentences,grammatical errors and inconsistency with the meaning of the original text.In this paper,we propose a method of news text summarization based on generative adversarial networks for extracting the summary information of Chinese and English news texts and improving the quality of generated summaries to address the existing problems,and the main work points are as follows.1)The overall architecture of the text summarization algorithm based on generative adversarial networks-Seq2Seq generator model and a two-input two-loss function discriminator model based on pre-trained language model are proposed.2)Improving the input and output of the generator.In this paper,we propose to add word boundary information to the word-level input and introduce prior knowledge.This paper proposes dynamic weighted softmax output to reduce the accumulation of model errors and the possibility of multiple consecutive repetitive nonsense words.3)Propose a two-input,two-loss function discriminator model based on a pre-trained language model,and supervise the Seq2Seq generator model to generate summaries with high readability and the same meaning as the original text.4)Evaluate the model effects on the Chinese data set LCSTS and the English data set Gigaword.The evaluation metrics are the ROUGE series values,the statistical values of the generated abstract length,and the manual ratings of the generated abstract quality.The experimental results show that the methods proposed in this paper are all effective in improving the ROUGE values,and the ROUGE-1 and ROUGE-2 metrics are higher than the existing models.The current LCSTS data set ROUGE-1 is 39.22 and ROUGE-2 is 26.10;the Gigaword data set ROUGE-1 is 37.10 and ROUGE-2 is 18.13.Comparing the Transformer-based Seq2Seq model,the statistical values of the generated abstract lengths are all more similar to the statistical values associated with the reference abstracts.The manual scoring has a significant improvement,where the number of abstracts scoring 3 to 5 is improved by 27.67%on the LCSTS data set and 11.95%on the Gigaword data set.The above experimental results demonstrate that the proposed methods are effective in improving the ROUGE evaluation index and can solve the problems of word repetition,short sentences,grammatical errors and inconsistency with the meaning of the original text to a certain extent,and improve the quality of abstracts.
Keywords/Search Tags:text summarization, generative adversarial net, a dual input dual loss function discriminator, dynamic weighted softmax output, word boundary information input
PDF Full Text Request
Related items