Font Size: a A A

Research On Text Summarization Method Based On BERT Model

Posted on:2022-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q PanFull Text:PDF
GTID:2518306539453174Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of social network,rapid growth of data mining in information retrieval and natural language processing makes automatic text summarization necessary,and how to process and utilize text resources effectively has become a research hotspot.Automatic text summarization is the process of compressing and extracting information from input documents while still retaining its key content.The process is designed to convert text or text collections into short summaries containing key information.Currently,typical summarization methods include extractive and abstractive.Most of the existing methods have strong coding capabilities.Nevertheless,they fail to solve the problems of long text dependence and semantic inaccuracy.Therefore,this paper has done an in-depth study to further solve a major problem of the fact that the generated summaries does not match the source text.The main research work is as follows:(1)Propose a automatic text summarization framework based on topic embedding.This work aims to study the mapping potential topics to guide the the direction of text generation by topic modeling.However,the topic modeling based on word co-occurrence can not solve the problem of limited information and vocabulary in given text.Therefore,in this paper,we propose a topic-aware extractive and abstractive summarization model based on BERT with neural topic model.First,this method matches the encoded latent topic representation with the BERT embedding through the neural topic model,and guides topic generation to meet the requirements of text semantic representation.Second,this method uses the transformer architecture to jointly explore topic inference and text summarization in an end-to-end manner,and uses the self-attention mechanism to model long-distance dependence while capturing semantic features.In addition,a two-stage extractive-abstractive model is build up to share the information and advantages.Finally,the experimental results achieves the highest ROUGE score,indicating the importance of topic representation for semantic representation and confirming the effectiveness of the method.(2)Propose a knowledge-emphasized abstractive text summarization framework.This work aims at the problem that the false information in the summary is inconsistent with the facts in the source text.The purpose is not only to preserve the subject information of the source text,but also to pay attention to the fact consistency of the generated summary.Consequently,in this paper,we propose a text summarization model based on BERT and knowledge enhancement to introduce additional structured knowledge through knowledge graph.Firstly,the method uses dual encoders,a BERT-based document encoder and a knowledge graph encoder to drive the model to complement each other with structured information when acquiring contextual features.Secondly,this method proposes the concept of sub-topic enhancement,which is to encode each paragraph into a sub-graph to obtain the sub-topic knowledge representation,and integrate it with document-level knowledge into the decoding process.In addition,FCM algorithm is used to inject comments to enhance the ability of model generation.The result of automatic and manual evaluation shows that on the CNN/Daily Mail and XSum datasets,the model has the ability to capture the original topic and consistent information,while improving the factual accuracy of the summary.
Keywords/Search Tags:Text Summarization, Neural Topic Model, BERT, Knowledge Graph
PDF Full Text Request
Related items