| With the rapid development of the Internet and digital technology,the number of scientific and technological literature has shown an explosive growth trend.The rapid increase in the number of documents has brought about the problem of information overload,which poses a major challenge to the research work of researchers.How to achieve the acquisition and integration of important viewpoint information from a large amount of literature information has become a current research hotspot.Automatic summarization is an important technology in the field of natural language processing.It can automatically extract important content and viewpoint information of documents through computers to achieve the purpose of information compression.Therefore,this article focuses on the scientific and technological literature with strict structure specifications,and conducts a multi-document summarization extraction study,in order to help scientific researchers quickly obtain useful information,reduce the duplication of scientific research investment and improve scientific research efficiency.On the basis of the investigation and analysis of the current research status of automatic summarization at home and abroad,as well as the sorting of related concepts and methods,this paper makes an in-depth study on the extraction of multi-document summarization for scientific and technological literature from the following aspects:(1)Refer to some ideas and processes of manual literature review,propose and construct a summarization extraction method.Specifically,the method includes four parts: document data preparation,document division based on research topics,multidocument summarization extraction on the same topic based on chapter rhetoric function and the final summarization generation.First of all,carry out the data acquisition,cleaning and sorting of scientific and technological literature under a certain field or topic.Next,realize the division of different research topic document clusters with the help of the text clustering algorithm K-means.Then,extract the summarization based on the rhetorical structure of the literature and process the redundant information from the literature under the same research topic to get the multi-document summarization of the topic.Finally,integrate and aggregate the multidocument abstractive of several subject document clusters and form a final summarization of the field.(2)Based on the strict textual rhetorical function structure of scientific and technological literature,this paper explores the abstract extraction method based on textual rhetorical structure.On the basis of chapter division,combined with Doc2 vec and text similarity calculation method,the important viewpoint sentences of the rhetorical structure of each chapter are extracted to form the summarization of the literature.(3)This paper conducts an experimental comparative analysis on the extraction methods of extracting summarization of scientific and technological literature.Firstly,the experimental corpus is acquired and cleaned.Then,the method proposed in this article is compared with the two commonly used multi-document summarization extraction methods,Sumbasic and Lex Rank.The experimental results show that the summarization method proposed in this study is superior to the other two baseline summarization methods,and the three evaluation indicators of ROUGE-1,ROUGE-2 and ROUGE-L are improved by 0.1362,0.0499 and 0.00665 on average respectively.The summarization extraction method for scientific and technological literature proposed in this paper is scientific,effective and feasible. |