Font Size: a A A

Research On Biomedical Named Entity Recognition In The Construction Of Precise Medical Knowledge Base

Posted on:2020-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LuoFull Text:PDF
GTID:2404330599952383Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Precision medicine is a cutting-edge medical concept that by analyzing biomedical data of patient constructs a knowledge graph which will discover the individual's disease mechanism,then develops a personalized diagnosis and treatment plan.The biomedical knowledge base with the gene-variation-disease relationship as the core content has an irreplaceable important role in the scientific research and clinical practice of precision medicine.Faced with the massive and rapid growth of biomedical literature,manual extraction of knowledge will consume a lot of time and manpower.Therefore,the use of machine learning technology to automatically mine biomedical text has gradually become a key link in the construction of precision medical knowledge base.Biomedical named entity recognition is to automatically recognize the name of the specified biomedical entity type in the text by computers,which is a fundamental and crucial step in the bio-document knowledge mining process.Based on the construction of the precision medical knowledge base,this paper systematically studies the methods and techniques for the identification of three named entities of genes,gene mutations and diseases in the biomedical literature,and proposes a mutation recognition based on deep neural networks and traditional methods.The new algorithm model developed a software system that identifies and labels three entities.Below are key research findings1.The research status of biomedical named entity recognition algorithm is investigated,and the various method models used in the algorithm are studied.The survey found that the current mainstream disease and mutation recognition models are mostly based on statistical machine learning algorithms,which require complex artificial feature engineering and rely on the designer's professional knowledge background and natural language processing experience.At the same time,the gene named entity recognition algorithm is pretty mature,and there are many recognition tools.Systems that can identify genes,gene mutations and diseases are rarely published.Simple-implemented high-performance algorithms and multi-entity recognition systems still need to be developed.2.A innovative mutation recognition algorithm combining deep neural network and traditional statistical model is proposed and implemented.Using the deep word segmentation strategy proposed by the author,the words are segmented according to capitalization,numbers and special symbols,and then trained to token embeddings that can capture the internal structure information of the mutant entity.Then the token embeddings are input into the bi-directional Long Short-Term Memory(Bi-LSTM)network to obtain a single vector representation of each word,and then the sequence of word vectors is input to the next-level Bi-LSTM network and two fully-coupled layers,outputing the probability of different labels of the word.In order to further improve the recognition performance,the Viterbi algorithm is used to optimize the neural network output,and then combined with the regular expression matching output to obtain the final labeling result.The algorithm achieved an F-value of 91.59% on the tmVar mutation corpus corpus,which is higher than other known reported systems.3.In order to realize the rapid identification and identification of the three entities of genes,gene mutations and diseases in biomedical texts,we combine the existing gene recognition algorithms and disease,mutation recognition algorithms developed by the research group to construct a system for auotomatically recognizing gene,mutation and disease entities which can parallelly tag the input text,then use the longest sequence coverage method to synthesize the output of different recognition algorithms.The system is simple to operate,can quickly and accurately recognize the target entity,laying a foundation for the relationship extraction between the target entities.
Keywords/Search Tags:Biomedical named entity recognition, token embedding, Long Short-Term Memory, Auto-recognizing system
PDF Full Text Request
Related items