Research On Improving Biomedical Named Entity Recognition Models By Incorporating Multi-source Information

Posted on:2022-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Tong

Full Text:PDF

GTID:2530306323477414

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Biomedical Named Entity Recognition(BioNER)aims to automatically extract entity mentions such as Disease,Gene,and Chemical from large volumes of unstructured text,which is the basis for downstream Natural Language Processing(NLP)tasks.Current deep learning-based BioNER methods based typically require large amounts of training data.While the annotated BioNER datasets are often difficult to obtain and small in scale due to the limitations of privacy,ethics and high degree of specialization.To alleviate this problem,unlike conventional methods that only use token-level information,we propose a method that can simultaneously utilize the latent multi-source information in the dataset.Concretely,we design multiple auxiliary tasks to make full use of the coarse-grained information implicit in the dataset itself,and thus improve the BioNER performance.On the other hand,most BioNER methods do not consider domain knowledge,this thesis also preliminarily explores the conversion of BioNER into a machine reading comprehension problem by introducing a priori knowledge through well-designed question-answer pairs.Furthermore,current neural architecture BioNER systems regard independent sentence as their training unit without considering the document-level context.Unfortunately,those methods that ignore document-level contextual information often suffer from the tagging inconsistency problem,i.e.,different sentences with the same entity mentions are incorrectly recognized as different labels.To tackle this problem,we propose a document-level BioNER model with an additional cache module that helps capturing the inter-sentence information.To dynamic updating the cache,we design an auxiliary task for measuring the importance of the history encoder states and perform this task simultaneously with document-level BioNER.We used the current state-of-the-art pre-training model BioBERT as a baseline system and conducted experiments on three publicly available BioNER datasets.The results show that the model with introducing intra-sentence coarse-grained information achieves F1 value boosts of 0.40,0.37 and 0.91,respectively,while the model with introducing prior knowledge achieves F1 value boosts of 0.46,0.30 and 0.43,respectively,and the model with introducing inter-sentence information achieves F1 value boosts of 0.30,0.53 and 1.08,respectively.

Keywords/Search Tags:

Biomedical named entity recognition, Multi-source Information, Multi-task learning, Dynamic cache

PDF Full Text Request

Related items

1	Research On Biomedical Named Entity Recognition Algorithm Based On Multi-Task Learning
2	Research On Biomedical Named Entity Recognition Method Based On Deep Neural Network
3	Research On Biomedical Named Entity Recognition Based On Deep Learning
4	Research And Application Of Biomedical Named Entity Recognition Based On Reinforcement Learning
5	Research And Implementation Of A Biomedical Named Entity Recognition Method Based On Deep Learning
6	Research On Named Entity Recognition And Normalization For Biomedical Text
7	Research On Biomedical Named Entity Recognition Method Based On Word Meaning Enhancemen
8	Research On Biomedical Named Entity Recognition Based On Weak Supervision
9	Biomedical Named Entities Recognition Based On Classifiers Ensemble
10	Research On Biomedical Named Enyity Recogniyion Method Based On Deep Learning