Font Size: a A A

Research On Biomedical Named Entity Recognition Based On Deep Learning

Posted on:2024-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:X C FanFull Text:PDF
GTID:2530307100995359Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:
With the rapid growth in the volume of literature on biomedical applications and the rapid advances in natural language processing(NLP)research,the role assumed by biomedical literature mining has become increasingly critical.More researchers are turning their attention to deriving more meaningful knowledge from biomedical materials,while advances in deep learning techniques are strongly driving the development of new literature mining models in biomedicine.However,due to the difference in word distribution between the general domain literature corpus and the biomedical domain-specific technical literature corpus,direct application of advances in natural language processing techniques to biomedical text mining often results in poor recognition accuracy and possible inconsistencies in the correlation markers between different sentences.Therefore,to solve these problems,this paper investigates the recognition of biomedical named entities using a deep learning approach,with the following main research components:(1)This paper presents a method of pre-training data enhancement based on RTD improvement.Traditional pre-training methods usually employ masked language model(MLM)pre-training methods(such as BERT)by replacing some tags with [MASK] to corrupt the input,and then training a model to reconstruct the original tags.While they yield good results when transferred to downstream NLP tasks,they usually require a lot of computation to be effective.As an alternative,this paper proposes a more effective pre-training task called replacement token detection(RTD),and constructs a pre-training language model Bio ELECTRA for biomedical field.Experimental results show that the proposed pre-training task RTD is more effective than MLM,which further enhances the precision of named entity recognition.(2)Aiming at the problem of inconsistent marks in the traditional Bi LSTM-CRF model architecture,this paper applies attention mechanism to pay attention to the relevant marks in different sentences in the document to resolve the issue of inconsistent marks,and proposes the Att-Bi LSTM-CRF model architecture in the biomedical field.Through comparative analysis of experiments on BC4 CHEMD dataset,it is found that the document-level Att-Bi LSTM-CRF model has obtained better performance,with the accuracy,recall rate and F value increased by 0.34%,2.21% and 0.98%,respectively.Finally,the effects of additional features such as POS,Chunking and Dic on the performance of the model are studied experimentally.
Keywords/Search Tags:Biomedical named entity recognition, RTD, Pre-training data enhancement, BioELECTRA, ATT-BILSTM-CRF
Related items