Font Size: a A A

Research And Implementation Of An Automatic Text Summarization System For The Journalism Domain

Posted on:2024-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z M YangFull Text:PDF
GTID:2568307055498044Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The amount of textual information on major news websites and social media platforms is increasing day by day.Automatic text summarization technology can address the contradiction between information redundancy and quick reading faced by people,saving time and improving efficiency.Current automatic text summarization techniques are divided into extractive and generative.Extractive automatic text summarization is more friendly to long texts but lacks semantic information,generative automatic text summarization can generate new sentences and is more flexible than extractive summarization,but for longer texts,information may be lost due to truncation.This paper addresses these issues by studying both extractive and generative automatic text summarization and designing and implementing an automatic text summarization system for the news domain.The main contributions of this paper are as follows:(1)In this paper,a BERT-based summary extraction model is proposed.Since the input of extractive summarization is more than two sentences,while the input form of BERT is in the form of two text splices,in order to adapt to the summarization task,this paper modifies the input sequence and embedding method of BERT,and adds Transformer layer after the output of BERT to learn and develop deep semantic features,and selects sentences with higher scores as summarized sentences.(2)In this thesis,a BART-based summary generation model is proposed.In order to adapt the BART model to Chinese news texts,the sentence coding layer and the summary judgment layer of the BART model are modified in this thesis.The experimental results prove that the model has greatly improved the quality of summary generation compared with the traditional generative summary model.(3)On the basis of the above work,this thesis proposes an automatic text summarization system for the news domain,which combines extractive and generative approaches,using the extractive text summarization method to extract several key sentences from the original text,which serves to compress the information in the original text,and on the basis of the extractive approach,the generative summarization method is then used to finally generate a news text summary,improving the efficiency of the overall model.It also solves the problem of truncation of long texts by generative summarization.
Keywords/Search Tags:Automatic text summarization, Extractive Summarization, Abstractive Summarization, Journalism domain
PDF Full Text Request
Related items