Font Size: a A A

ToC-RWG: Explore The Combination Of Topic Model And Citation Information For Automatic Related Work Generation

Posted on:2020-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:P C WangFull Text:PDF
GTID:2518306548494444Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The related work section is a significant component of a scientific paper.Schol-ars need to contextualize their work in the related research scope and highlight their contributions in this section.A high-quality related work section requires scholars doing a survey of relevant researches by reading amounts of papers,summarizing relevant aspects of these researches and pointing out their weaknesses compared with own work,which tends to be an arduous and time-consuming job for scholars.In view of this,automatic related work generation is proposed to generate a related work section for a paper being written.In this thesis,we take advantage of topic model to capture the relevance be-tween the target paper and its reference papers.We propose Query Topic Sum to describe the generative process of the target paper and reference papers.For each target paper,each word is modeled as being generated from a mixture of a back-ground distribution ?~Band a document-specific distribution ?~D,while for each reference paper,each word is from a mixture of the background distribution ?~B,a document-specific distribution ?~Dand a target-reference distribution ?~T.We use ?~Tto describe the relevance between the target paper and the reference papers.Subsequently,this thesis introduces external citation information.We provide each reference paper with 3 to 20 citations.We use citations to identify the most corresponding context from the reference paper,which we call cited text spans(CTS).We propose a two-layer ensemble model to automatically identify CTS and our method achieves state-of-the-art on CL-Sci Summ dataset.With the CTS iden-tification technique,we find CTS for each reference paper based on citations.We integrate topic model and citation information in to a unified optimization framework ToC-RWG.With ?~Tas the target distribution and CTS as candidate sen-tences,ToC-RWG applies a greedy algorithm to select sentence that minimizes the KL divergence between the target distribution and the approximating distribution.After post-processing,we get the final generated related work section.Our evaluation results on a set of 50 scientific papers along with their corre-sponding reference papers show that our model achieves a considerable improvement over generic multi-document summarization and scientific summarization baselines.
Keywords/Search Tags:automatic related work generation, topic model, citation, cited text spans, ensemble model
PDF Full Text Request
Related items