Research On Entity Recognition For Biomedical Texts

Posted on:2023-02-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y B Xiao

Full Text:PDF

GTID:2544306836973319

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Named Entity Recognition as a core and basic task of natural language processing field,is a key technique to analyze and manage massive texts information,which has the considerable significance of application and economy in the era of big data.Especially in biomedical area,with the rapid development of the Internet,a large amount of biomedical-related texts containing huge values are stored on the network in unstructured or semi-structured forms.Therefore,it has been becoming popular that how to effectively extract potential and valuable information from texts and alleviate the challenges brought by the excess of information.This thesis mainly studies on the named entity recognition from biomedical texts based on deep learning methods.This paper focuses on the works as follows:(1)The entity identification for analyzing and modeling is carried out,and a multi-features model based on a long short-term memory network is proposed according to the characteristics of biomedical texts.In the proposed model,the features of the input layer mainly include three representations,such as word,character,and part-of-speech,where the word representations consist of local and global word vectors.The character representations are combined by the encoding of convolutional neural network and long short-term memory network respectively.The experimental results tested on JNLPBA and NCBI-disease datasets show that the proposed model receives better performance compared to other prevalent methods,which obtains an F1 score of 75.42% and86.96%,respectively.Moreover,the ablation experiments are also conducted that demonstrate the features of character and part-of-speech are playing a vital role in improving performance,and the character features impact the most greatly on model results.(2)The pretraining technique is applied to the biomedical entity recognition task,consequently a multi-tasks jointly training model based on the pretraining approach is proposed.Different from the previous models which generally utilize a unique task as a training target,the proposed method builds the model in multi-tasks jointly learning ways.This thesis also innovatively presents a dynamic weighted model structure between different network layers instead of directly outputting the vector representation obtained by the last layer.The results on test datasets show that the proposed structure promotes performance,which the F1 scores of 76.57% and 88.24% are obtained among the same datasets.It indicates that the pretraining-based method has great potential for application in natural language processing tasks.Furthermore,the label smoothing method is employed to optimize the issue of data imbalance,so that the result is further improved by 0.14%on the JNLPBA dataset,reaching 76.61%.(3)The two proposed models are compared and analyzed from the perspectives of time and performance,respectively.The results indicate that the multi-tasks model is superior to the multi-features model in terms of performance,which improves the F1 values by 1.15% and 1.28%,respectively.However,from the perspective of time-consuming,experimental data shows that the training and inference time of the multi-features model is far less than its counterpart.In detail,the average training time of the multi-features model is 0.807 minutes in one epoch,which is greatly decreased compared to 4.268 minutes of the multi-tasks model.And the average inference time for the multi-tasks model is 45 ms,which requires twice the time of the multi-features model,which is22 ms.It can be concluded that the pretraining-based method has better performance with a higher time cost,which is the opposite of the non-pretraining models.(4)Considering that there are lacking of simple and efficient entity recognition tools in the biomedical field,this thesis also develops a light online system,which can identify the disease entity from the given texts.The system is constructed based on the Flask framework and takes into account the simplicity and maintainability while implementing basic functions,which can be used as an auxiliary tool to improve work efficiency.

Keywords/Search Tags:

biomedical texts, named entity recognition, long short-term memory network, pretraining model, multi-task

PDF Full Text Request

Related items

1	Research On Biomedical Named Entity Recognition Based On Machine Learning
2	Research On Biomedical Named Entity Recognition In The Construction Of Precise Medical Knowledge Base
3	Complicated Named Entity Recognition For Biomedical Texts
4	Clinical Named Entity Recognition From Chinese Electronic Medical Records Using A Double-layer Annotation Model
5	Research On Named Entity Recognition And Normalization From Biomedical Text
6	Research On Identification Of Bacteria Named Entity In Biomedical Documents
7	Research Of Biomedical Named Entity Recognition Based On BioBERT
8	Research On Named Entity Recognition And Normalization For Chinese Biomedical Texts
9	Research On Method Of Medical Named Entity Recognition Based On Pre-trained Model
10	Biomedical Named Entity Recognition Based On Local Feature Enhancement