Font Size: a A A

Research On Text Summarization Method Based On Representation Learning

Posted on:2024-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:J LongFull Text:PDF
GTID:2568307151960509Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The automatic text summarization algorithm aims to generate a summary containing the main content of the text for a given text through an algorithm that has significant application value in news and analysis of public opinion.Because of the close relationship between automatic text summarization algorithms and text representation,this paper studies the two mainstream methods of text summarization algorithms from the perspective of representation learning: abstractive summarization algorithms and sentence-level extractive summarization algorithms.The main research content is as follows:Firstly,the abstractive summarization algorithm generates summaries verbatim based on the original text.The summaries are not limited to the sentences and words that appear in the original text and have high flexibility.In the summary generation process,everyday words in the text will become summary candidates at a high rate,resulting in a significant difference between the summary result and the standard summary.After analysis and demonstration,this is represented by the word embedding of the pre-trained model.The problem of word frequency offset is caused by space distortion.This paper proposes a generative text summarization framework based on word embedding representation correction in response to this problem.The algorithm first uses the clustering algorithm and the neighborhood measurement method to find out the dimension that has the deepest influence on the word frequency,and then uses the word embedding representation correction module to correct the representation distribution of the word embedding,so as to eliminate the influence of the word frequency feature on the representation distribution of the word embedding.Secondly,sentence-level extractive summarization is to extract sentences from the text to form a summary.The content of the summary comes from the original text,so it has incredibly high reliability.The mainstream method of obtaining sentence embedding is to take Mean Pooling for word embedding as sentence embedding.When the sentence is longer,the proportion of non-key information is more extensive,and the influence on sentence embedding is more decisive.The sentence-level extractive summarization algorithm has a bias of sentence length and tends to select shorter sentences as summary candidate sentences.Aiming at the sentence length bias problem,this paper proposes an extractive summarization framework based on a dimension-level attention mechanism.Specifically,the algorithm introduces more microscopic dimension-level features than word embeddings to reduce the proportion of non-key information.The sentence embedding obtained by the dimension-level attention mechanism has more complete information and richer feature values.Then sentence embeddings compute the result of extractive summarization through activation functions and neural networks.Finally,in this paper,the proposed method is tested on three public text summarization datasets CNN/DM,XSum and Wiki How.Experimental results show that the two summarization algorithms proposed in this paper exceed the baseline model in summarization quality,which verifies the effectiveness and superiority of the method proposed in this paper.
Keywords/Search Tags:Abstractive summarization, extractive text summarization, representation correction algorithm, dimension-level attention mechanism, representation learning
PDF Full Text Request
Related items