| In today’s internet era,a large amount of news appears in people’s daily lives,requiring individuals to spend more effort and adequate time searching and identifying the content of interest in massive information databases.Text summarization technology can summarize the key information and main content of news without changing its meaning or losing its important information,thereby helping people reduce reading time.In recent years,the introduction of the Point Generator Network(PGNet)has greatly promoted the development of summarization generation technology,effectively solving the problems of out-of-vocabulary(OOV)words and content repetition.However,the model has issues with fully understanding the contextual meaning of sentences,resulting in generated content that lacks key information.Additionally,the model may suffer from gradient vanishing and explosion when the text sequence is too long,leading to poor performance on longer texts.This thesis focuses on improving the automatic text summarization technology based on PGNet.Specifically,the main research work of this thesis includes the following aspects:(1)To address the issues of insufficient contextual understanding of sentences and the lack of core information in generated content by the PGNet model,this thesis proposes a summary generation model called Re-PGNet(Reforced Point Generator Network)based on the point generator network.The model utilizes a BERT pre-trained language model to extract more fine-grained contextual representations of input sentences.Then,at the encoder side of the PGNet,a word-sentence matching matrix is used to incorporate the word state with the most relevant original text information to enhance the ability to extract effective information from the original text.Finally,key words are incorporated as prior knowledge int o the decoding process of the model,generating a summary oriented towards the key words.(2)To address the issue of poor performance of the PGNet model on longer texts,this thesis divides the summarization task into two stages: an extraction stage and a generation stage.An automatic summary extraction model called MD-Text Rank(Multi-dimensional Text Rank)based on Text Rank and multi-dimensional semantic features fusion is proposed to extract sentences from the original text that represent the key content of the article.These sentences are used as input to the Re-PGNet model in the generation stage to generate the final summary.The MD-Text Rank model is built upon the traditional Text Rank algorithm,which incorporates BERT representations of news text information,and weights sentences based on four dimensions: sentence-topic similarity,sentence-title similarity,keyword coverage,and whether the sentence contains specific feature words.The weighted sentences are then used as candidate summary sentences for the Re-PGNet model.Finally,two conclusions are drawn in this thesis.First,the experimental results show that the proposed Re-PGNet model with enhanced original text representation is effective in handling both long and short text summaries on the LCSTCS Weibo short text summary dataset and the NLPCC 2017 news long text summary dataset.Compared to current mainstream generative text summarization methods,the proposed method achieved significant improvements in Rouge evaluation metrics.However,the Rouge scores on the NLPCC 2017 dataset were lower than those on the LCSTCS dataset,indicating that the model’s handling of long text still needs to be improved.Second,the experiments show that the multi-dimensional weighted summary method,which extracts candidate summaries based on the proposed MD-Text Rank model,outperforms Text Rank algorithm and other baseline models.Furthermore,the two-stage generated summary results showed significant improvements in Rouge scores compared to the summary results generated directly by the PGNet model on long texts. |