Chinese Text Summarization Technology Based On Improved BERT Pre-training Model And Graph Neural Network

Posted on:2022-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Xu

Full Text:PDF

GTID:2518306749471974

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

In order to enable people to accurately obtain key content from a large amount of textual information,text summarization techniques have been widely focused.With the development of natural language processing and deep learning technologies,many deep learning-based text summarization methods had been proposed.In this thesis,by studying the current text summarization model,we found that the text summarization model for Chinese language would have the problems of splitting semantics,generating incoherent summaries,too much redundant information,and unable to deal with long sentences effectively.In response to the above problems,this thesis proposed an extractive-generative Chinese text summarization model,and the specific research works were as follows:(1)In this thesis an improved BERT-based extractive text summarization model adapted to Chinese was constructed.Instead of using word vectors,the model can be inputted with word vectors after word separation,which can reduce the occurrence of semantic fragmentation.Meanwhile,in the MLM pre-training task,the masking strategy was changed and dynamic long sequence masking was used to improve the model's ability to understand words and sentences.Besides,in order to be able to handle long texts effectively,the model employed layer-wise position encoding to generate the position information of word vectors as another part of the input.The segment embedding and the NSP pre-training tasks were also removed,and the long sequences were trained directly to reduce the interference of noise in the training.In this thesis,we have verified that the above change points can effectively achieve the purpose by ablation experiments.(2)This work proposed a GNN-based generative text summarization method.By using the Graph2 Seq model that maps from graph data directly to sequences,the extractive model generated summary would be converted into graph structure data,graph vectors would be generated using graph encoder,and keyword attention and graph attention would be introduced to build a decoder that integrated multiple attentions to generate the final summary.This method could effectively utilize the characteristics of graph structure data and GNN to generate more concise and fluent abstracts.A comparison experiment between this article's model and several baseline models was conducted on the Chinese dataset nlpcc2017,and the experimental results showed that the extractive-generative text summarization proposed in this thesis scored 40.07%,23.17%,and 32.27% for ROUGE-1,ROUGE-2,and ROUGE-L,respectively,in terms of evaluation metrics,which were all higher than the more common The text summarization model generated by this thesis was more concise and more suitable for the topic.In summary,the text summarization model proposed in this thesis performed better on the Chinese dataset and had certain reference value.

Keywords/Search Tags:

Chinese, Text Summarization, BERT, GNN

PDF Full Text Request

Related items

1	Research On Chinese Text Summarization Technology Based On BERT-KA-PGN Model
2	Research On Automatic Generation Method Of Chinese Text Summarization
3	Research On Text Summarization Method Based On BERT Model
4	Research Of News Text Summarization Based On Deep Learning
5	Research On Text Summarization Algorithm Based On Deep Learning
6	Research On Methods Of Improving Semantic Coherence Of Text Summarization
7	Research And Implementation Of Text Summarization Technology Based On Semantic Understanding
8	Research On Automatic Text Summarization Based On User Comments
9	Research On OOV And Long Distance Dependency Of Chinese Abstract Summarization Model
10	Research On Chinese Text Summarization Based On Nuclearity