Font Size: a A A

Research On Automatic Text Summarization Based On Deep Neural Networks

Posted on:2022-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:S Q FuFull Text:PDF
GTID:2518306323478674Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,automatic text summarization has become one of the important re-search directions in the field of artificial intelligence and natural language processing.Automatic text summarization aims to extract key information from the original text and generate a semantically fluent and concise summary,with the aim of improving the efficiency of users in browsing information.With the development of deep learn-ing,today’s automatic text summarization models are mainly built based on sequence-to-sequence frameworks.However,there are many problems in the application of sequence-to-sequence framework in automatic text summarization,such as difficulty in generating Out-of-Vocabulary words,inability to effectively model the connection be-tween words,and lack of modeling the key information extraction process.To address these problems,this paper improves the automatic text summarization model based on sequence-to-sequence framework,and the main research contents are summarized as the following two points:(1)We proposed a abstractive text summarization method based on improved sub-word units.We use the improved subword segmentation algorithm to partition a com-plete word into different subword units and build a vocabulary,which can reduce the size of the vocabulary.By splitting words into subword units with the improved sub-word segmentation algorithm,words with the same meaning but different morphol-ogy will reflect stronger associations,such as words affected by singular and plural,tense,and other factors.Since out-of-vocabulary words can be composed by differ-ent subword units,this algorithm can also effectively alleviate the problem of out-of-vocabulary word generation.We conducted experiments on the Gigaword dataset,the CNN/Daily Mail dataset,and the XSum dataset,and the experimental results verify that the method can better model the connections between words and alleviate the problem of difficult Out-of-Vocabulary words generation.(2)We proposed a abstractive text summarization method based on hierarchical information filtering.We use the dynamic routing algorithm to dynamically compute the global vector based on the encoder output,and then use the global vector to guide the hierarchical information filtering algorithm.The hierarchical information filtering algorithm filters the information in the original text at two levels:the word-level and the semantic-level.First,we use the global vector and the output of the encoder to calculate the weight of each word in the input text,and select the keywords in the input text based on the weights.Then,we use a dual-gate unit to filter the semantic noise in the input text.Specifically,the dual-gate unit contains two gates:a filter gate and a complementary gate.The filter gate initially filters the semantic noise in the input text,and the complementary gate adds part of the original information to the filtered text representation to form the final text representation,which can avoid the problem of over-filtering the noise in the input text.We conducted experiments on the Gigaword dataset and the CNN/Daily Mail dataset,and the experimental results validate the effectiveness of our proposed method on noise filtering.
Keywords/Search Tags:Automatic Text Summarization, Subword Units, Noise Filtering, Deep Neural Network, Natural Language Processing
PDF Full Text Request
Related items