Font Size: a A A

Research On Entity Relation Extraction From Biomedical Text

Posted on:2018-06-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H ZhaoFull Text:PDF
GTID:1318330542469131Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Academic literature is the most important way for researchers to present and share their academic achievements.At present,the massive biomedical literature has become a huge treasure trove of biomedical knowledge and thus become the most important resources for biomedical research area.Biomedical relation extraction refers to using techniques of natural language processing,machine learning and deep learning to efficiently and accurately extract relations between biomedical entities(gene,chemical,disease,etc.)from biomedical literature.Biomedical relation extraction and related researches as useful complemental methods can help biomedical researchers and inspire the biological experiments,and can be widely applied to the field of life science research.As the foundation of biomedical relation extraction,biomedical named entity recognition also attracts the researchers' attentions.Therefore,biomedical named entity recognition,biomedical relation extraction and biomedial relation triple extraction are studied in this thesis.The main contents of this thesis include the following three aspects:For the biomedical name entity recognition,multiple label convolutional neural network(ML-CNN)method is proposed.ML-CNN treats the name entity recognition as a word level classification problem while other methods(such as Conditional random field,CRF)treat it as a sentence level sequence tagging problem.Given a word,only the fixed-size window of words around it are inputted into the ML-CNN model.And multiple label strategy(MLS),which is appropriate to word level classification architecture,is proposed to capture the dependency information between labels and it simplifies the process of learning the dependency information between labels.Compared to the CRF method,ML-CNN needs less feature engineering that enhances its generalization ability.Finally,ML-CNN achieves satisfactory performance on the disease NER problems(CDR and NCBI corpora)and the chemical NER problem(CHEMDNER corpus).For the biomedical relation extraction,a syntactic convolutional neural network(SCNN)model is proposed.The syntax word embedding is proposed to represent a sample with more rich information.What's more,SCNN encodes the one-hot format feature vectors to the distributed format ones,and then combines them with the other distributed format feature vectors.The encoding process will make the combination better than before.Finally,SCNN method achieves the state of the art performance on the DDIExtraction 2013 corpus.For the biomedical relation triple extraction,a hybrid method is proposed.It divides the biomedical relation triple extraction problem into three sub-problems.First,the ML-CNN is utilized to recognize biomedical entities from the biomedical literature.Then,the entity pairs interacting with each other are extracted among the recognized entities using SCNN.Finally,the interaction words that represent corresponding entity pairs' relationship types are identified using a rule-based method to generate the triple-(entityl,interaction word,entity2).By integrating machine learning methods with the rule-based method,our hybrid method overcomes the low recall rate drawback of the Open IE method.We conducted the experiments on the PPI corpus(AImed)and achieved the state of the art performance.
Keywords/Search Tags:Biomedical literature, Named entity recognition, Relation extraction, Convolutional neural network, Deep learning
PDF Full Text Request
Related items