Text summarization,which is a key subfield in natural language processing,plays an important role in people’s lives.In modern society where large amount of information is increasing drastically,people can easily obtain various information.Compared with information of image and video stream,long text document obviously needs more time to read and understand.Therefore,text summarization can meet people’s needs for efficiency when acquiring information from long text document.However,the current text summarization models and systems still face some difficulties and challenges.One of the biggest challenges is the factual inconsistency problem,that is,the facts contained in the machine summary are inconsistent with that in the according input text.Such flaws will undoubtedly reduce the credibility and reliability of text summarization models and systems.Therefore,this thesis conducts a series of research,discussion and analysis on the factual inconsistency problem in text summarization,and then proposes some factual improvement methods which are especially suitable for Chinese text summarization.The primary work of this thesis is as follows:(1)This thesis proposes a factual detection algorithm for summarization based on generative negative sampling.This algorithm is mainly aimed at "named entity error",which is the most frequent kind of error in factual inconsistency problems.The algorithm contains three generative negative sampling methods.These methods make use of text summarization model to artificially construct some incorrect summaries with named entity errors,in this way,a semi-supervised binary classification dataset with reference summaries as positive samples and incorrect summaries base on generative negative sampling as negative samples can be obtained.Then,a factual detection model,which can detect whether summaries contain named entity errors,is trained on that binary classification dataset.In the training stage,this thesis combines PU Learning into the training process of the factual detection model,which further improves its performance of detecting factual errors.Experiments show that the summary factual detection model can not only be used to detect whether the machine summaries have factual inconsistency problems,but also can perform factual sorting on several candidate machine summaries generated according to the same context,so as to select the most factual one as the final output.(2)This thesis proposes a method for fact-aware text summarization combined with keywords of essential elements.Through the observation and analysis of the related work of factual enhancement for text summarization and the generation results of some advanced Chinese summarization model,this thesis combines Prompt and part-of-speech tagging to train a keyword extraction model,which can extract four types of keywords "time","posititon","object" and "numerical words".By injecting the extracted keywords into the input of the text summarization model,the additional factual information can be used to assist the text summarization model to generate,which can better improve the factual quality of the generation results.(3)Combining the above two factual improvement approaches for text summarization,this thesis designs and implements a system which can help users to obtain information from Internet more efficiently and imporve machine summaries’ reliability as well. |