Font Size: a A A

Named Entity Recognition For Medical Field

Posted on:2023-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:L F QiuFull Text:PDF
GTID:2544306845475264Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Based on the medical data to construct medical intelligence system is the important way to alleviate the lack of medical resources problem,and the form of medical data with unstructured text is given priority to,including medical literature,medical record text,etc.,contains a large number of patients health related medical knowledge,medical terms in the text,and medicine is the most direct embodiment of medical knowledge.How to efficiently identify these medical terms from medical texts has become a key factor affecting the performance of medical intelligent systems.The objective of medical named entity recognition is to extract medically significant symptoms,history and diseases from medical texts.For named entity recognition in the open field,the performance of its model has been very mature after BERT appeared,but the semantic relationship in medical texts will be more complicated,especially in Chinese medical records,there are many medical nouns with long length and non-standard writing that increase the difficulty of recognition.On the other hand,in medical texts,Medical terms are often nested and discontinuous,so the open domain named entity recognition model is difficult to be directly used in the medical field.Compared with general continuous named entity recognition,existing nested entity for medicine,medical work of discontinuous entity recognition is relatively small,and in most cases is an entity,discontinuous entity will be nested individually targeted research,but actually in the data set containing discontinuous entity,often at the same time there is a nested entity,therefore,In this thesis,a generalized discontinuous named entity recognition task is redefined,that is,general continuous entities,nested entities and discontinuous entities in text are all named entity recognition tasks for extracting objects.The main work of this paper is as follows:(1)Annotated a Chinese biomedical data set that can be used for discontinuous named entity recognition tasks.This thesis summarizes the named entity recognition in different task scenarios commonly used medical data sets,summary,it found that the lack of can be used for discontinuous named entity recognition task of Chinese data sets,so under the guidance of a doctor,this thesis is based on the real electronic medical record data,using the BRAT with a set of contains a variety of types of named entities cases in Chinese text data set.(2)A new baseline model of discontinuous named entity recognition is proposed.Existing models for discontinuous named entity recognition cannot effectively identify discontinuous entities.This thesis improves the existing baseline model of discontinuous named entity recognition from two aspects: label scheme and the introduction of label correction module,and takes the improved model as the comparison model of discontinuous named entity recognition task in this paper.(3)Referring to the work of dependency parsing,the hypergraph based method and the transfer-based method are applied to named entity recognition,and two named entity recognition models are obtained,which can recognize both continuous and nested entities independently and multiple types of entities simultaneously.Experiments were conducted on the corresponding Chinese and English medical datasets,the corresponding experimental results were recorded and compared with the baseline model for experimental analysis.
Keywords/Search Tags:Medical named entity recognition, Nested entity, Discontinuous entity, Dependency parsing
PDF Full Text Request
Related items