Font Size: a A A

Research On Named Entity Recognition And Normalization For Biomedical Text

Posted on:2023-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y J SunFull Text:PDF
GTID:2530306827475024Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of medical information construction and biotechnology,the number of biomedical documents and electronic medical records shows an exponential growth trend.The rich information contained in a large number of biomedical texts has become a valuable resource for biomedical research.However,because most biomedical texts exist in the unstructured form described by natural language,computers can not directly analyze and utilize them effectively.Therefore,the extraction and utilization of valuable information from biomedical text through biomedical text mining technology will have a far-reaching and positive impact on the progress of biomedical technology and the development of information construction in the field of medical and health.Biomedical Named Entity Recognition and Normalization is one of the basic tasks of Biomedical Text Mining.It aims to identify predefined biomedical entities from biomedical texts and map them to standard ICD codes or other biomedical ontologies,so as to provide support for downstream biomedical knowledge map construction,intelligent medicine and intelligent medical insurance cost control.Based on this,this thesis studies the tasks of Biomedical Named Entity Recognition and Clinical Term Normalization.Aiming at the problems of nested entities and lack of manually labeled corpus in Chinese medical named entities,a medical entity recognition method based on Pointer Network and Adversarial Training is proposed in this thesis.By using Pointer Network annotation,nested entities and flat entities can be identified indiscriminately.By using Adversarial Training to add disturbance to the text vector representation to generate countermeasure samples,the problem of poor robustness of the model can be effectively alleviated.The experimental results show that the improvement of annotation strategy and the introduction of confrontation training can effectively improve the performance of the model.Aiming at the problems of colloquialism,nonstandard and diversity of clinical term description in Clinical Term Normalization,this thesis proposes a Clinical Term Normalization method based on deep semantic matching.The candidate term set is generated from the standard term set by using Jaccard similarity algorithm,the deep semantic features of clinical terms are extracted by BERT model,and a classification model is constructed to obtain the standard clinical term names.This method was tested on the dataset of CHIP2019 Clinical Term Normalization task,and the accuracy rate reached 90.04%,which verifies the effectiveness of our method.Aiming at the problems of uncertain number of implicated standard words,low literal overlap between the original text and standard words,dependency between standard words and many standard parts of speech in the task of Chinese multi-implicated Clinical Term Normalization,this thesis proposes a multi-implication Clinical Term Normalization method based on knowledge enhancement.The general idea of this method is a three-stage strategy of rough recall,fine sorting and re-matching.By constructing the number prediction module of standard words,introducing the knowledge representation learning algorithm to capture the internal relationship between standard words,and constructing the set of mapping rules between clinical diagnosis text and standard words,we can improve the performance.This method was tested on the dataset of CHIP2020 Clinical Term Normalization task,and the experimental results verifies the effectiveness of this method for the multi-implication Clinical Term Normalization task.
Keywords/Search Tags:Biomedical Information Extraction, Named Entity Recognition, Clinical Term Normalization, Text Matching
PDF Full Text Request
Related items