Font Size: a A A

Research On Biomedical Named Entity Recognition Based On Deep Learning

Posted on:2019-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y X JiangFull Text:PDF
GTID:2428330566984188Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Biomedical named entity recognition is an important preliminary step for many biomedical information extraction tasks,such as biomedical relationship extraction and event extraction.The current mainstream methods for biomedical named entity recognition are based on the neural networks to avoid the complex hand-designed features derived from various linguistic analysis.However,the performance of existing neural networks does not achieve the optimal shallow machine learning method.Therefore,how to use neural network to improve the performance of biomedical named entity recognition is the main content of this thesis.In order to avoid the conventional neural network ignoring some potential word-level and sentence-level semantic information,we propose a novel Long Short Term Memory?LSTM?Networks model integrating two channels and sentence-level reading control gate.For the input,two channels are extended in the architecture to pick up the information from the pre-training and fine-tuning word embeddings respectively.Then,a sentence-level reading control gate is introduced into our model to decide what information should be retained or discarded for the future time steps.Finally,we utilize the CRF model to efficiently model tagging decisions dependently.The experimental results show that our method can achieve an F1-score of 89.49%on the BioCreative II GM corpus.Although two channels are integrated in the network can consider richer semantic information,there are still some problems when Out-Of-Vocabulary words exist in the corpus.Therefore,we consider character-level word embeddings and language model based on LSTM-CRF integrating sentence-level reading control gate.For the input,the character-level word embeddings are extended to describe the spelling information of the word more accurately,and combine the character-level word embeddings with original word embeddings based on attention mechanism as the final input.At the same time,the language model is integrated into the neural network to learn general-purpose patterns of semantic and syntactic composition based on all available data.Then,the learned features from language model can be reused in the network to predict the label more accurately.Finally,our method obtains an89.94%F1-score on the BioCreative II GM corpus,superior to all existing systems,and also achieves satisfactory results on the JNLPBA corpus.Overall,in this thesis,we use two deep learning architectures to improve the performance of biomedical named entity recognition.Finally,our proposed model outperforms all the existing systems on the BioCreative II GM corpus without the complex hand-designed features and post-processing,and 0.89%F1-score higher than the current best performing system.
Keywords/Search Tags:Biomedical Named Entity Recognition, Deep Learning, Two Channels, Reading Control Gate, Language Model
PDF Full Text Request
Related items