| With the rapid development of scientific research and the abundant publication of scientific literature,it is time-consuming and challenging for researchers to keep up with the changing trends and improve innovation in a research field.This thesis studies the application of automatic text summarization technology in the field of scientific research,which helps researchers understand the differences and development trends in the field by generating comparative summaries between scientific papers through computers.It eases the pressure of reading literature and provides inspiration for innovation.Firstly,this thesis investigates the problem of comparative graph-based summarization of scientific papers guided by comparative citations.We propose a task of comparative summarization of scientific papers guided by comparative citations,a comparative scientific summarization corpus(CSSC)guided by comparative citations,and a comparative graph-based summarization(CGSUM)model for comparative scientific summarization.The comparative summarization task aims to generate comparative summaries for a set of related scientific papers to illustrate the commonalities and differences between them.The comparative summary corpus is constructed by collecting comparable topics and related papers through clues from comparative citations and manually generating reference summaries.The comparative summarization model is an unsupervised model that evaluates sentence salience within papers,sentence difference between papers,and sentence commonality between papers and citations.By comprehensively considering the information contained in sentences through these three types of sentence relationships,the model selects the most suitable sentences to form the target summary.To further demonstrate the efficiency and transferability of the model,we apply it to query-based multi-document summarization tasks and conduct experiments using the DUC 2006 and DUC 2007 datasets.The evaluation metrics in this study include both automatic evaluation metrics and human evaluation metrics.The experimental results show that the proposed model outperforms the baseline model,and is efficient and transferable with low computational costs.Secondly,this thesis studies the influence of rhetoric on the generation of comparative scientific summaries,introduces theories of rhetoric and cross-document,and proposes a model that combines relevance-based sentence relationships and rhetoric-based sentence relationships for generating comparative scientific summaries.This model not only retains the advantages of the comparative graph-based model by considering complex relevance-based sentence relationships but also analyzes the rhetoric of sentences within and cross papers by introducing rhetoric-based sentence relationships.The rhetorical structure theory is used to discriminate sentence rhetoric within papers,and the cross-document structure theory is used to analyze sentence rhetoric cross papers.Through this research,the resulting sentence relationship fusion model can comprehensively consider the relevance-based sentence relationships and rhetoric-based sentence relationships within and cross documents,resulting in comparative summaries with multiple levels of semantics.Experimental results show that the improved model has improved performance and interpretability.The overall experiments conducted on the dataset constructed in this study and public datasets verify the feasibility and transferability of the proposed model.In addition,compared to the baseline models,the proposed model achieves good results.This research proposes a new summarization task,comparative citation-guided comparative scientific summarization,and provides a suitable and feasible model for this task.The proposed model is is efficient and easy to implement,and can be applied to the generation of comparative scientific summaries. |