Font Size: a A A

Research On Chinese Abstractive Summarization Via Incorporating Implicit Topic Information

Posted on:2022-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:H ZengFull Text:PDF
GTID:2518306350953099Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the amount of text information on the Internet is increasing exponentially every day.It is difficult for people to obtain the important information in massive texts efficiently.Therefore,the automatic text summarization has become an urgent need.With the development of deep learning technology and the emergence of large-scale datasets,significant progress has been made in the research of automatic summarization,which has gradually transitioned from the extractive method to the abstractive method.The extractive method mainly forms a summary by extracting important sentences in the text,and the abstractive method is based on the semantic understanding of the text to summarize the text as concise and smooth as possible,which also conforms to the facts of the original text.The generation of abstractive method is more flexible and its results can often contain words or fragments that are not in the original text,but it is prone to have prominent problems such as inaccurate generation and missing key information that affect the quality of the summary.Based on these issues,this thesis proposes to incorporate implicit topic information into the summarization model,so that it can generate a higher-quality summary that covers the topic of original text as much as possible.The main research work of this thesis is as follows:Firstly,the existing models mostly use the topic keywords or key sentences of the text to guide the summarization model to summarize text that conforms to the topics.However,the existing technology often fails to completely extract all the keywords of the original text,and multiple discrete keywords cannot interpret the semantic connection between relevant important information and topics of the text.Meanwhile,only using key words can easily lead to ignoring the deeper semantic information of the text,especially the guiding role of implicit topic information in summarization.Therefore,this thesis proposes to obtain the implicit topic information of the text through the variational auto-encoding topic model,and then integrate it into the attention mechanism of the pointer abstractive summarization model to affect the decoding output,so that the model can generate a summary that is more in line with the topic semantics under the guidance of the global implicit topic information.The experimental results on NLPCC which is the Chinese single-text summary open dataset verify the feasibility and effectiveness of the proposed model.Secondly,the text often contains many different subtopics,and the distribution of different subtopics is quite different.In addition,each word in the text also has a different degree of relevance to the subtopics.Consequently,if the subtopic information related to different words can be distinguished,it is expected to further improve the quality of the summary.In view of this,this thesis proposes to introduce the implicit topic information attention mechanism to obtain the attention distribution of all subtopics in the current time step of summarization model,so as to select the most relevant subtopic information for the current generation.Compared with the global implicit information,the advantage of the implicit topic information attention mechanism is that it can flexibly obtain and distinguish relevant local topic information at different time steps,and specifically we integrate the implicit topic information context vector of the current time step into the probability distribution of vocabulary to get the final decoding output.In addition,since manual standard summaries often have better coverage of key topic information,we propose to introduce the topic information of standard summary in the training phase of the model to strengthen the filtering and screening of irrelevant information.The experimental results and analyses on NLPCC prove the effectiveness of the proposed method.
Keywords/Search Tags:variational auto-encoding topic model, attention mechanism, implicit topic, pointer abstractive summarization model
PDF Full Text Request
Related items