Font Size: a A A

The Optimization Of Extractive Text Summarization Based On Pretrained Language Model

Posted on:2022-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:H F GuoFull Text:PDF
GTID:2518306572997559Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Pretraining language model is widely used in various natural language processing tasks.In this paper,Ro BERTa is migrated to extractive text summarization task,and four optimization methods are used to improve the quality of the extractive abstracts.The hierarchical encoder mechanism is proposed to solve the problem of text truncation.The hierarchical encoder mechanism consists of a sentence-level encoder,Ro BERTa,and a document-level encoder,Transformer encoder,which keeps more text information while providing higher-level information integration.In order to make better use of the text relationships in the document,the discourse graphs are built based on the coreferences,and the graph convolutional neural network is used to update the graph node information.The two-stage approaches are applied in this paper.The extract-then-match approach can dynamically determine the number of sentences,and optimize the sentence combination by matching candidate summaries.The matching model is a siamese network based on Ro BERTa,which maps the original text,reference summary,and candidate summaries to the same semantic space and the candidate summary closer to the original are selected.The extract-then-rewrite approach is used to reduce the redundancy of the extractive summary.The abstractor is the Transformer that the encoder is replaced by Ro BERTa.The rewriting operation produces the abstractive summary from the extractive one through the abstractor.Finally,comparative experiments are conducted on the CNN/Daily Mail data set.The experimental results and analysis show that the above four improvements to the extractive text summarization method solve the corresponding problems and further improve the ROUGE score.The summary generated by the models matches the original text better.
Keywords/Search Tags:Pretrained language model, Text summarization, Hierarchical encoder mechanism, Graph neural network, Two-stage approach
PDF Full Text Request
Related items