Font Size: a A A

Research On Grammar-aware English Text Summarization Based On Deep Learning

Posted on:2022-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:F CaoFull Text:PDF
GTID:2505306575465594Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The popularity of the Internet facilitates the flow of information in people’s daily life,and a large number of text data is far beyond people’s receiving ability.How to extract concise and important information from these text information is a very meaningful and challenging task.With the development of natural language processing,automatic text summarization has become one of the hot topics in this field.In recent years,the main research work has focused on the semantic relationship between the source text and the abstract,and the generated abstract lacks a certain grammatical normalization.As a carrier of language,text not only needs to contain the semantics of a specific scene,but also must abide by the grammar rules of the language.This thesis mainly studies the task of text summarization generation from semantic and grammatical aspects.The specific research contents are as follows:(1)When dealing with the summary data set,the grammatical structure of the summary sentences is simple,which are basically simple sentences such as subject + predicate +object,subject + predicate + object + object complement,Therefore,when processing the summary,each word in the summary is attached with a part-of-speech(POS)tag and a component-dependent(DEP)tag,and they are input into the corresponding neural network respectively when training the network,and then the real words,POS and DEP at the current time are predicted at each decoder time step.After a large number of samples training,the model can grasp the grammar rules of simple sentences in the summary.(2)After analyzing the data in the data set,it is found that some rare words,such as person names,place names,organization names,numbers,etc.,are likely to appear in the corresponding abstract with high probabilities with a high probability.Especially in the abstract of simple sentences,the probability of occurrence is greater in the subject,object and object complement.In view of this,this thesis designs a grammar pointer network,which makes the model tend to directly copy the person names,place names,organization names and numbers in the source text at the position of subject,object and object complement when generating the summary.The experimental results show that,compared with the basic model,the Rouge value of the model on the public dataset CNN Daily Mail is improved by 3.3%,1.32%,5.3% and 2.76%,1.63%,3.7% in dataset Gi Gaword.Through the above two aspects of research,this thesis proposes a grammar aware pointer network for generative summarization,which not only tracks the key semantics of the original text,but also allows the model to fit the grammar rules in the summarization.
Keywords/Search Tags:text summarization, abstractive, grammar-awared, pointer network
PDF Full Text Request
Related items