Font Size: a A A

Research On The Technology Of Semantic Matching Gap Sentences Generation And The Method Of Doctor-patient Dialogue Summarization

Posted on:2023-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:K WuFull Text:PDF
GTID:2544306623980449Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,more and more application scenes and needs have proposed higher requirements for text summarization methods,including literature retrieval,clinical records,as well as auxiliary diagnosis.In recent years,natural language processing techniques have gradually matured,and Transformer-based models are widely applied in text summarization tasks.Nevertheless,Transformer models are limited for the input length because too long inputs result in the loss of significant semantic content.Simultaneously,how to select sentences for masking operation by pre-trained Transformer models is also a worthing problem.Moreover,recent research on text summarization for dialogue has gradually gained attention.In the text summarization task of the doctor-patient dialogue,it is necessary to generate the summary of the clinical record according to the characteristics of the scene and the use,which is crucial for the doctor to quickly comprehend the condition and assist the downstream task to make an auxiliary diagnosis.However,due to the late start of doctor-patient dialogue summarization research and patient privacy,the existing dataset of doctor-patient dialogue is insufficient to support large-scale training,and the problem of scarcity of annotated data is particularly prominent.As a consequence,in this thesis,studies are conducted to address the following problems:(1)To handle problems of limited input length of the Transformer model and how to select sentences for masking,this dissertation proposes a text summarization method based on semantic matching gap sentence generation.When encountering a document whose length exceeds the input limit,independent truncation may lead to the problem of missing important contents.Hence,a sliding window pointer generation network module is proposed to extract the actual contents.This module allows the exchange of semantic information between windows and reduces the text length while retaining more comprehensive semantic information.When generating summaries,this thesis deems that the semantics of a high-quality summary should be closest to the document,so the semantics of the group of sentences selected for masking should also be close to the document semantics.Therefore,the semantic matching and gap sentence generation modules are put forward.Among them,the former selects the candidate sentence groups with high semantic matching.At the same time,the latter makes the selected sentences in the document for masking operation for training prediction to learn a more comprehensive semantic content to enhance the summary quality.The effectiveness of this thesis’ s method for promoting the quality of long text summaries is verified through experiments on several datasets.(2)To solve the scarcity of annotated datasets in doctor-patient dialogue summaries,this thesis proposes a self-supervised learning method for doctor-patient dialogue summaries based on the topic structure by combining the characteristics of doctor-patient dialogue with specific topics.Firstly,the topic structure of doctor-patient dialogues is divided into symptoms,personal attributes,medication,examination results,and past medical history,making the model pay more attention to these contents.Secondly,due to the principle where the diagnostic results of the original dialogue and the summary are similar,the self-supervised learning method constructs two auxiliary tasks of diagnostic result extraction generation and diagnostic result classification to provide inherent supervised signals through training these tasks.The method’s effectiveness in this thesis is verified in experiments on AMI and Med Dialog datasets.
Keywords/Search Tags:text summarization, doctor-patient dialogue, Transformer, semantic matching, gap sentences, self-supervision
PDF Full Text Request
Related items