Font Size: a A A

Transformer-based Automatic Summarization Of Online News

Posted on:2024-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:S HaoFull Text:PDF
GTID:2568307058953289Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Currently,with the rapid development of internet information technology,particularly the exponential growth of text-based information,there is a huge influx of massive amounts of information every day.The increasing prevalence of "click-bait" news articles poses a significant challenge to readers.Therefore,there is an urgent need for a tool that can provide easily understandable summaries to alleviate information overload.Text summarization technology aims to generate concise and informative summaries from various online news articles and has become a major focus of research both domestically and internationally.However,existing summarization models generally suffer from inaccurate content,poorly constructed summary sentences,and the inability to resolve the problem of polysemy in Chinese words.Furthermore,generated summaries often fail to capture the main theme and key information from the original text.To address these issues,this paper proposes two approaches for research.(1)it is difficult for current network models to obtain the main theme of the text from a macro perspective,resulting in inaccuracies in the generated content.To address this issue,this paper proposes a Transformer network that incorporates topic information.The model uses a keyword extraction algorithm to mine the main theme information from the text and incorporates it into the model for computation.This allows the model to generate a topic-oriented summary guided by the keywords.To resolve the problem of polysemy,this paper trains word vectors using a pre-training model to obtain feature encoding for words with different semantics,which is then used to generate summaries.The proposed model’s effectiveness is demonstrated by validating it on a real-world dataset.(2)this paper considers the issue of irrelevant noise in attention calculations for long sequences,which can reduce the model’s accuracy.To address this issue,a gating network is proposed that filters out low-attention noise and improves the model’s efficiency in processing human characters,thereby improving the accuracy of summary generation.To address the problem of Chinese word sequence ambiguity,this paper enhances the order and structure of word vectors using relative position encoding to enhance the model’s understanding of the positional relationships between words,thereby improving the coherence of generated summary sentences.Additionally,an improved decoding strategy is used to bring the generated summary closer to the human-standard summary.Experimental results on a dataset demonstrate that the proposed model achieves significant improvements in the ROUGE-1,ROUGE-2,and ROUGE-L metrics on the LCSTS dataset.
Keywords/Search Tags:automatic summarization, Transformer network, gated filtering, keyword fusion, relative position encoding
PDF Full Text Request
Related items