Font Size: a A A

Research On The Key Techniques Of Biomedical Text Mining

Posted on:2020-12-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:L LuoFull Text:PDF
GTID:1364330578971744Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the major form of academic research presentation,biomedical literature has become the important biomedical domain resource,which provides a rich source of knowledge for biomedical research.According to the biomedical domain demand,effective text mining technology can obtain biomedical related information from the massive biomedical literature efficiently and accurately,which will greatly promote the research in the life science field.However,due to the limitation of the traditional machine learning model representation ability,text mining methods based on the models are difficult to improve the performances.In recent years,with the rise of deep learning research based on neural networks,breakthroughs have been obtained in speech,image and text processing.Therefore,this dissertation focuses on the key technologies of biomedical text mining based on deep learning,and studies on three tasks:biomedical text classification,named entity recognition(NER)and relation extraction.For the biomedical text classification task,a neural network ensemble approach is proposed to address the problem of the limited size of the training set.In this approach,a module pre-trained by the relevant dataset is incorporated into the neural network model to improve the performance.Afterward,the ensemble model is built by combining the models'results with a logistic regression classification.Recently,biomedical text classification with neural networks has gained increasing attention,but domain knowledge has been rarely used in these methods.Aiming to exploit domain knowledge,a domain knowledge-enriched self-attention convolutional neural network approach is proposed.In this approach,the multi-channel convolutional neural network architectures are devised to utilize the knowledge embeddings.The experimental results show that these knowledge embeddings can improve the performances of the deep learning models.For the biomedical entity recognition task,most of NER methods are sentence-level ones which have the tagging inconsistency problem.To address the problem,a novel attention-based bidirectional Long Short-Term Memory with a conditional random field layer approach is proposed for document-level chemical NER.The neural network architecture relies on a novel attention mechanism to capture similar entity attention at the document-level.The experimental results show that the method can significantly improve the tagging consistency and achieve the state-of-the-art performance.Moreover,most of existing Chinese NER works often follow the English processing methods.To address the characteristics of Chinese characters,a Chinese clinical named entity recognition method based on Chinese stroke ELMo is proposed.The stroke ELMo is learmed from a language model,which is pre-trained on a large text corpus.And it is contextualized embedding and contains the internal structure information of Chinese characters.The experimental results show that adding the stroke ELMo can improve the performances of the models.For the biomedical relation extraction task,the common pipelined methods neglect the relevance between the subtasks and the results of NER may affect the performance of relation classification which leads to error propagation without any feedback.To alleviate the problem,a neural network-based joint learning approach is proposed to the joint extraction of biomedical entities and relations.Specifically,a tagging scheme that takes into account overlapping relations is proposed to convert the joint extraction task to a tagging problem.Then a neural network model is used to extract entities and their relations in texts with our extraction rules.This approach can fully exploit the dependencies of entities and relations.The experimental results show that the method outperforms the pipelined methods on two datasets and significantly improves the performance of overlapping relation extraction.
Keywords/Search Tags:Biomedical Literature, Text Classification, Named Entity Recognition, Entity Relation Extraction, Natural Language Processing
PDF Full Text Request
Related items