Research And Implementation Of Automatic Extractive Summarization On Medical Papers

Posted on:2024-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Qi

Full Text:PDF

GTID:2568306914982649

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

In the field of natural language processing,automatic summarization can compress long documents and extract important information to support downstream information retrieval and information storage.Among the automatic summarization tasks,long document summarization has been one of the research hotspots due to its long input length,difficult semantic analysis,and low compression rate.The paper document is the main representative of long documents,and the number of papers in medical field has been growing rapidly in recent years due to the emergence of the coronavirus disease 2019.Although the existing extractive summarization techniques for long papers have made some progress,the following problems still exist:(1)The latest extractive summarization methods tend to adopt an attention mechanism based on the sequential relationships,ignoring the section relationships of papers,resulting in producing summaries with serious head distribution problems.(2)The specialized domain knowledge of the papers is not considered,and the content understanding is limited,which to some extent affects the system to achieve better results.In this thesis,we have made an in-depth study of the extractive summarization method for medical papers to address the above problems.Firstly,this thesis analyzes the data distribution and optimizes the automatic label-construction algorithm,and observes the structural correlation between the summary and source document statistically.Next,according to the structural distribution and medical entity knowledge,this thesis designs a heterogeneous graph based on explicit writing structure and implicit knowledge structure in the source document.The graph contains sentence nodes,entity nodes,and section nodes as well as semantically rich connection edges that can capture cross-sentence and cross-section logic while analyzing intra-sentence semantic features to generate more comprehensive summaries.At the same time,this thesis provides a large-scale academic paper dataset CORD-SUM which regards coronavirus as its main research content.The experimental results conducted on CORD-SUM show that compared with previous work,SAPGraph can generate more comprehensive summaries with higher similarity scores to the reference.Also,SAPGraph can achieve better results on another multi-domain academic paper dataset arXiv.In addition,this thesis provides a demo system that can perform long paper summarization in real-time and present graph modeling structures.In conclusion,this thesis constructs a structure-and knowledge-aware heterogeneous graph to optimize comprehensiveness and accuracy in extractive summarization,and is able to obtain effective automaticproduced summaries on long medical papers.

Keywords/Search Tags:

extractive summarization, heterogeneous graph, long document summarization, medical paper

PDF Full Text Request

Related items

1	Research On Automatic Extractive Document Summarization Incorporating Heterogeneous Graph
2	Research On Extractive Summarization Of Scientific And Technological Information Text Based On Deep Learning
3	Extractive Automatic Text Summarization For Long Sequences
4	Research On Extractive Multi-document Summarization Using Supervised Deep Learning
5	Extractive Summarization For Long Documents Without Manual Annotation And Low-resource Scenarios
6	Research On Query-focused Multi-document Summarization
7	The Study On Extractive Multidocument Summarization
8	Research And Implementation Of Enterprise Document Knowledge Search System Based On Deep Learning
9	Research On User Preference Oriented Controllable Meeting Summarization Algorithm
10	Research And Implementation Of Document Summarization Based On Combined Multi-Feature