Font Size: a A A

Study On Information Extraction And Optimization Of Medical Case Texts Based On BERT

Posted on:2024-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:C XueFull Text:PDF
GTID:2544307181954459Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of natural language processing technology and the application of deep learning algorithms,research in the medical field has gradually shifted towards intelligence and automation.In clinical medicine,case text is an important form of medical data,which records important information such as the patient’s basic information,medical history,symptom description,and the doctor’s diagnosis and treatment process.This information is crucial for doctors to make correct diagnoses and treatments.However,the professional knowledge in fields such as biology contained in case text poses great difficulties for doctors’ reading and comprehension due to semantic complexity and inconsistent vocabulary expression.Therefore,how to use natural language processing technology to automatically analyze and interpret case text,extract useful information,and assist and guide doctors has always been a hot and difficult problem in the medical field.The BERT model is currently one of the most advanced pre-trained language models,which can learn a large amount of language knowledge through pre-training and can adapt to various downstream tasks through fine-tuning.In the task of extracting information from case text,the BERT model can be used to analyze case text and automatically extract information such as disease names,disease types,causes,and symptoms,thus reducing the workload and work pressure of doctors.The pre-training phase of the BERT model is conducted on a large-scale corpus in the general domain,which does not sufficiently cover specialized terminology or jargon in the medical field.Therefore,when applying the BERT model in the medical domain,the model may face problems of data sparsity and inconsistent distribution.To address this issue,the paper proposes a method for extracting information from medical case texts based on domain-adaptive BERT.Then,to address the problem of high false positive rates in text extraction due to the complex semantics and inconsistent vocabulary expression of specialized knowledge in fields such as medicine and biology,the paper proposes an optimized Prompt paradigm learning method for medical case text based on BERT,aiming to reduce false positive examples in the text extraction process.Finally,based on this,the design and implementation of a medical case text extraction system are completed.The specific research content is as follows:(1)Research on domain-adaptive BERT-based extraction method for case text information.Since the BERT model is pre-trained on large-scale general domain data,the model may face the problem of data sparsity and inconsistent distribution when applied to case text domain.To solve this problem,we propose a domain-adaptive BERT-based method for extracting information from case text.This method first uses the initial pretraining task of BERT for further pre-training on case text to adapt to domain data,and then uses the domain-adaptive BERT for fine-tuning the case text extraction task.Finally,through ablation experiments with classical methods,the effectiveness of the domain-adaptive BERT-based extraction method for case text information is verified.Experimental results show that using the domain-adaptive trained BERT for case text extraction improves performance compared to classical methods that directly fine-tune downstream tasks.(2)Research on optimizing the learning of medical case text using the BERT-based Prompt paradigm.Due to the high semantic complexity and lexical ambiguity of medical case text,traditional text extraction methods are prone to false positives.To address this issue,this study proposes an optimization method for learning medical case text using the BERT-based Prompt paradigm.Firstly,difficult samples are identified by analyzing the false positive rate of the samples.Then,a Prompt paradigm is constructed using these difficult samples and specific downstream tasks.Finally,the Prompt paradigm is introduced into the training process of downstream tasks,aiming to provide additional guidance to the BERT model for more accurate identification of entities,relationships,and events in medical case text.Experimental results demonstrate a significant improvement in BERT-based text extraction methods when incorporating the Prompt paradigm compared to traditional approaches.(3)Design and Implementation of a BERT-based Medical Case Text Information Extraction System.Adopting an object-oriented programming approach and utilizing TKinter as the development framework,we have developed a medical case text information extraction system using a fine-tuned BERT model as the inference network.The system takes preprocessed data as input and performs extraction based on the extraction targets,generating text indices as output,which can be used to retrieve the extracted text.
Keywords/Search Tags:Nature Language Processing, Neural Network, Text Extraction
PDF Full Text Request
Related items