Font Size: a A A

Research On Identification Of Bacteria Named Entity In Biomedical Documents

Posted on:2019-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2404330548467497Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The interactions between microorganisms,especially bacteria,determine the organization and function of microecological communities,thus affecting human health and the function of environmental ecosystems.It is expensive to obtain bacterial interactions through experimental methods,with the increasing accumulation of high-throughput sequencing data,bacterial interactions have become a hot topic of research through computational inference methods.However,the lack of standard interactive data sets makes evaluation and verification of calculation methods problematic.On the other hand,a large number of experimentally validated bacterial interactions are implied in the biomedical literature,but how to quickly and accurately mine these potential interactions from these vast literature is a new problem.Bacteria named entities have their own unique attributes compared with previous biomedical entities,such as complex types,the emergence of new entities,the polysemy of the word,and the large number of entity nesting phenomena,etc.These properties lead to the complexity of the identification task of bacterial named entities.This paper studies bacterial named entity recognition methods based on conditional random fields(CRF)and bacterial dictionaries and a method of bacteria named entity recognition based on deep learning,which obtained a good recognition effect,the main research work and contributions are as follows:(1)Bacterial named entity recognition methods based on conditional random fields and bacterial dictionaries.This article referred to the classic Genia Corpus V3.02 corpora,annotated more than one thousand corpora that can be used to identify bacterial named entities.The bacteria dictionary was constructed by UMLS.According to the unique method of bacterial naming,42 features were manually designed.The model was learned using CRF algorithm.The optimal feature set was selected by combining the optimal combination method.We compared it with the performance achieved by the CRF-based named entity recognition task in other fields,and then used SVM(a classification algorithm commonly used in the biological field)to train a model for comparison.In order to deal with the inefficiency of large-scale data processing,a bacteria named entity recognition system based on Spark distributed platform was proposed in speed improvement.(2)Bacterial named entity recognition methods based on Deep Learning.The features used in supervising machine learning methods need manual selection and feature selection,require domain prior knowledge,are closely related to the problem being solved,are not universally applicable,and the performance of the model largely depends on the representation of the data.It takes a lot of time and effort to constantly design better features.Aiming at these problems,this paper proposed a bacterial named entity recognition system based on conditional random field and bidirectional LSTM network(BI-LSTM-CRF).After training,verification and evaluation,FI-Measure reached 86.718%.The experimental results show that the bacteria named entity recognition system based on BI-LSTM-CRF not only does not need to extract features manually,but also has less programming workload,and the prediction effect is better than the CRF and dictionary-based bacteria named entity recognition in the previous work of the author.The bacterial named entity recognition systems proposed in this paper have good speed and performance,and can quickly and effectively identify bacterial named entities from large biomedical literature.The work of this paper lays a foundation for the extraction of bacterial interactions from medical literature.
Keywords/Search Tags:Text Mining, Bacteria Named Entity Recognition, Conditional Random Fields, Microbial Interactions, Bidirectional Long-Term Memory Networks
PDF Full Text Request
Related items