Font Size: a A A

Research On Biomedical Named Enyity Recogniyion Method Based On Deep Learning

Posted on:2022-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LiuFull Text:PDF
GTID:2480306764493364Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
There are growing number of ways to obtain data in the era of big data,the literature in the field of biomedicine is growing at an immeasurable rate.Relying on human to manage the knowledge contained in the biomedical literature is no longer realistic,so both academia and industry urgently need automatic biomedical text information extraction technology.As a basic and key information extraction technology,named entity recognition aims to identify the entities contained in text by sequence annotation.Effectively identifying the named entities lay the foundation of entity relation extraction,entity normalization,etc.In recent years,with the rise of deep learning,deep learning-based BioNER model has attracted the attention of more and more researchers.Based on the analysis of existing problems in the deep learning-based biomedical named entity recognition model,this paper improves the existing model to improve the effect of biomedical named entity recognition.The main contributions are as follows:(1)CRF have been widely used in deep learning methods,due to its powerful sequence modeling ability.However,in practical,its imperfect tag sequence reasoning mechanism may cause error propagation.This problem is particularly obvious in biomedical corpus.we propose gated CRF to alleviate this problem,which is significantly improved compared to the model based on normal CRF.(2)The embedding layer of the existing model utilizes rich word-level features and character-level features,but ignores the text syntactic dependency that is also important for named entity recognition.This paper proposes a feature enhancement module based on self-attention mechanism to introduce the syntax dependency features into the model.Experimental results prove the importance of syntactic dependence information for the task of biomedical named entity recognition and the efficiency of the feature enhancement module proposed in this paper.(3)For the problems of static word embedding on polysemous words representation,OOV avoiding,and poor performance on small datasets,this paper proposes a BioNER model based on BioBERT.With the powerful context modeling ability of BioBERT,our method has obtained 76.78%,83.42%,88.72%,91.13%,93.63%,86.37% F1 values on the biomedical datasets of JNLPBA,BC2GM,NCBI-disease,BC4CHEMD,BC5CDR-chemd and BC5CDR-disease respectively.Compared with static words,our method has improved significantly.
Keywords/Search Tags:named entity recognition, deep learning, conditional random fields, self-attention mechanism
PDF Full Text Request
Related items