Font Size: a A A

Research On Automatic Text Summarization Methods Incorporating Multi-domain Information

Posted on:2024-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z L DuanFull Text:PDF
GTID:2558307181453954Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic text summarization refers to the use of computer technology to output short,coherent text and similar to the main idea of the original document.Recently,deep learning has had significant success in improving automatic text summarization methods.But due to the limitations of the model,such as the lack of dependence of long short-term memory networks on long sequences of text,it is easy to cause information missing,resulting in the generated summaries are easy to deviate from the main idea of the original text and have poor readability.To this end,this thesis studies and proposes an automatic digest method that incorporates multi-domain information.The main research contents of this thesis are as follows.(1)To address the problem that existing generative abstract methods focus only on the relevance of the original words to the abstract words and ignore the influence of the original topic on the summaries,we propose an abstract generation method that combines global topic information,ACGT.The topic information extractor based on implicit Dirichlet distribution is constructed to extract key topic information of the original text.The attention module is constructed to fuse key topic information with the original text.And then combines the pointer generation network and coverage mechanism to generate the summaries.Experiments are conducted on two publicly available datasets,CNN/Daily Mail and LCSTS datasets.respectively,to verify the effectiveness of the method in this thesis.With evaluation metrics of ROUGE-1,ROUGE-2,and ROUGE-L,the experimental results of ACGT in English dataset CNN/Daily Mail are 0.96%,2.44%,and 1.03% higher than baseline model,respectively.In Chinese dataset LCSTS,ACGT shows a higher performance than baseline method by 1.19%,1.03% and 0.85%,respectively.(2)To address the problem of incorrect syntactic structure of the generated sequence in existing generative summary methods,a generative abstract method that incorporates wordlevel information is proposed.Firstly,based on the previous work(1),the original word vector is trained using a BERT pre-training model.Then a word-level feature extraction module is constructed to extract hierarchical phrase information through a grouped convolutional network.Then the obtained local feature set is passed through a bidirectional long and short-term memory network to obtain the global features between words and sentences.Finally,it is input to the decoder to guide the summary generation.The experimental results of work2 in English dataset CNN/Daily Mail are 1.52%,0.45%,and1.82% higher than baseline model,respectively.In Chinese dataset LCSTS,work2 shows a higher performance than baseline method by 3.26%,0.11% and 2.43%,respectively.The experiments show that the fusion of multi-domain information improves the coverage of the original text information and the fluency of the summary.Generally,the research in this thesis integrates word-level and chapter-level text feature information,which enhances the information representation of the original text and helps the model generate summary sentences with sufficient content and grammatical correctness.In addition,this thesis suggests that the hidden information of the text can be mined from other viewpoints in the next work to further enrich the semantic representation of the text.
Keywords/Search Tags:Automatic Text summarization, Topic model, Convolutional Neural Network, Pointer network, Coverage mechanism
PDF Full Text Request
Related items