Font Size: a A A

Research And System Implementation Of Medical Domain Machine Reading Comprehension

Posted on:2024-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:W R LvFull Text:PDF
GTID:2544307055498104Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Machine reading comprehension is one of the important branches of artificial intelligence technology,which aims to use algorithms to enable computers to understand the semantics of articles and answer questions posed by users.Research on machine reading comprehension technology in the medical field will help to improve the medical level,reduce medical costs,and enable more people to enjoy richer medical resource services.At present,there are few researches on machine reading comprehension in the medical field,so in order to enrich the medical domain data set in the field of machine reading comprehension,this paper constructs a medical domain data set oriented to machine reading comprehension according to the medical data sharing platform open on the Internet;Secondly,in order to improve the performance of the question answering system for machine reading comprehension in the medical field,this paper reasonably improves the machine reading comprehension model based on deep learning,and uses BERT and its variant pre training language model to research and experiment on machine reading comprehension tasks;Finally,this paper combines the machine reading understanding model and question matching technology to build a medical domain question answering system.The main research work of this paper is as follows:1 、 This paper constructs a medical domain machine reading comprehension dataset Medical QA.In this paper,a medical domain dataset for machine reading comprehension,Medical QA,is constructed by means of crawler and manual annotation.The data set is mainly sourced from two medical platforms,namely,Medical Seeking and Drug Seeking Network and 39 Health Network.It has crawled nearly 20,000 question and answer pairs,involving nine departments,including internal medicine,surgery,obstetrics and gynecology,and is finally used to build a question and answer system in the medical field.2、This paper improves the Match LSTM and Bi DAF models.Based on Match LSTM,this paper uses data reconstruction strategy to reorder the sentences in the text according to the relevance between the text and the problem,so that the sentences related to the problem in the text are read first,highlighting the characteristics of the text with greater relevance.Finally,the values of ROUGE-L and BLEU-4 of the improved Match LSTM model reached 33.96% and 27.80% respectively,which were3.89% and 3.17% higher than those of the unmodified Match LSTM model.This paper uses BERT to pre train the word vector for the improved Bi DAF model and adds a self attention layer behind the attention flow layer to deepen the relationship between the text and the problem and highlight the features of the text that are deeply related to the problem.The value of ROUGE-L and BLEU-4 of the improved Bi DAF model reached 33.34% and 29.01% respectively,which were 2.93% and 2.62% higher than that of the unmodified Bi DAF model.3、Research on machine reading comprehension based on pre trained language models.This article uses BERT and its variants of pre trained language models to conduct research and experiments on the Du Reader dataset and Medical QA dataset.It is found that the improved masking method and multi round fine-tuning mechanism can significantly improve the performance of the model in machine reading comprehension tasks.The results show that the best Ro BERTa-wwm-ext model achieves ROUGE-L and BLEU-4 values of 51.02% and 48.14% on the test set,respectively.In addition,in view of the problem that the model performance is not optimal when the data scale is large and the effective information is relatively dispersed,this paper adopts a three-step preprocessing of the dataset based on F1 score to find relevant paragraphs and answer modules and feature precomputation,so that the performance of the pre training language model is closer to the average human reading comprehension level.4、Construction of question answering system in medical domain.In this paper,the question matching algorithm and the best machine reading understanding model are combined to build a question answering system in the medical domain.First,the question matching algorithm is used to calculate the correlation between user questions and Medical QA dataset questions.If the correlation is greater than the threshold,the question answering system directly outputs the answers in the Medical QA dataset;If the correlation is less than the cut-off value,the machine reading comprehension model is used to answer.
Keywords/Search Tags:machine reading comprehension, Medical domain, Attention mechanism, BERT pre training language model, Masking mode
PDF Full Text Request
Related items