Research On Text Summarization Method Based On Representation Learning

Posted on:2024-09-28

Degree:Master

Type:Thesis

Country:China

Candidate:J Long

Full Text:PDF

GTID:2568307151960509

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The automatic text summarization algorithm aims to generate a summary containing the main content of the text for a given text through an algorithm that has significant application value in news and analysis of public opinion.Because of the close relationship between automatic text summarization algorithms and text representation,this paper studies the two mainstream methods of text summarization algorithms from the perspective of representation learning: abstractive summarization algorithms and sentence-level extractive summarization algorithms.The main research content is as follows:Firstly,the abstractive summarization algorithm generates summaries verbatim based on the original text.The summaries are not limited to the sentences and words that appear in the original text and have high flexibility.In the summary generation process,everyday words in the text will become summary candidates at a high rate,resulting in a significant difference between the summary result and the standard summary.After analysis and demonstration,this is represented by the word embedding of the pre-trained model.The problem of word frequency offset is caused by space distortion.This paper proposes a generative text summarization framework based on word embedding representation correction in response to this problem.The algorithm first uses the clustering algorithm and the neighborhood measurement method to find out the dimension that has the deepest influence on the word frequency,and then uses the word embedding representation correction module to correct the representation distribution of the word embedding,so as to eliminate the influence of the word frequency feature on the representation distribution of the word embedding.Secondly,sentence-level extractive summarization is to extract sentences from the text to form a summary.The content of the summary comes from the original text,so it has incredibly high reliability.The mainstream method of obtaining sentence embedding is to take Mean Pooling for word embedding as sentence embedding.When the sentence is longer,the proportion of non-key information is more extensive,and the influence on sentence embedding is more decisive.The sentence-level extractive summarization algorithm has a bias of sentence length and tends to select shorter sentences as summary candidate sentences.Aiming at the sentence length bias problem,this paper proposes an extractive summarization framework based on a dimension-level attention mechanism.Specifically,the algorithm introduces more microscopic dimension-level features than word embeddings to reduce the proportion of non-key information.The sentence embedding obtained by the dimension-level attention mechanism has more complete information and richer feature values.Then sentence embeddings compute the result of extractive summarization through activation functions and neural networks.Finally,in this paper,the proposed method is tested on three public text summarization datasets CNN/DM,XSum and Wiki How.Experimental results show that the two summarization algorithms proposed in this paper exceed the baseline model in summarization quality,which verifies the effectiveness and superiority of the method proposed in this paper.

Keywords/Search Tags:

Abstractive summarization, extractive text summarization, representation correction algorithm, dimension-level attention mechanism, representation learning

PDF Full Text Request

Related items

1	Research On Key Issues Of Abstractive Automatic Text Summarization
2	Research On Text Summarization For Chinese News
3	Research On Abstractive Text Summarization Based On Deep Learning
4	Research On Abstractive Automatic Text Summarization Methods
5	Research On Abstractive Text Summarization Technology Based On Pre Training Model
6	Abstractive Text Summarization Based On Transformer Model
7	Representation Learning And Dependency Syntax For Text Summarization
8	The Research And Implementation Of Automatic Text Summarization System For New Media
9	Research On Extractive Multi-document Summarization Using Supervised Deep Learning
10	Research And Implementation Of An Automatic Text Summarization System For The Journalism Domain