Font Size: a A A

Research On Entity Relation Extraction In Biomedical Texts

Posted on:2018-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ZhengFull Text:PDF
GTID:2348330536960959Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The entity relation extraction is an important branch of biomedical information extraction,and the Bacteria Biotopes task can further discover the association mechanism between organisms and the development of microbiology,which play a key role in many fields,such as the safety of food processing,health science,waste treatment and so on.However,the performance of the existing extraction method is not satisfactory,therefore how to improve the performance of Bacteria Biotopes task is the focus of this paper.Based on the common features,we construct a simple and efficient SVM based extraction system and improve the performance by word representation,brown clustering feature,external resource feature,entity type feature.Firstly,considering the domain information for biomedical,we utilize the biomedical domain-specific word representations to learn potential semantic information from the background corpus.Secondly,brown clustering method is applied to sentences,grouping similar entity pairs into clusters.The clustering results are used as features in this paper.Thirdly,we use species and term information as the external resource features.Then,the entities appear in different positions often have different meanings,for example,the importance of the entity which appears in the title is different from that appears in other places,therefore this information is used as a feature.Finally,our method obtains a 49.11% F-score on test set in the 2016 BioNLP-ST BB task.The SVM methods can utilize the domain expert experience to design artificial features which can obtain dominant knowledge.However,the generalization performance may be hurt and lead to over-design.Since the deep learning methods can capture hidden deep semantic information by iteratively training the neural network,we explore a so-called DET-BLSTM architecture.Firstly,The Shortest Path enclosed Tree(SPT)between two entities is obtained by GDep parser to get the informative words.In order to obtain more information,the Shortest Path enclosed Tree(SPT)is extended to the dynamic extended tree(DET),which can accurately encode the input information.Secondly,we map the DET to embeddings,including the word embed-dings,POS embeddings and distance embeddings.Thirdly,bidirectional LSTM networks pick up the information from forwards and backwards of the DET respectively and Softmax is utilized for classification.Finally,Considering the different advantages that SVM can utilizes the domain expert experience and LSTM can capture deep semantic information,we combine the two methods to improve the performance and the predictions of SVM is used for post-processing.The experimental results on the BioNLP'16 BB-event corpus show that our dynamic extended tree conditioned BLSTM(DET-BLSTM)with post-processing can achieve an F-score of 58.15%,which is better than the currently best BB-event system.In conclusion,we adopt shallow and deep method for relation extraction.Finally,the predictions of SVM are utilized for post-processing to improve the performance.Our DET-BLSTM model with post-processing is better than all official submissions to BioNLP-ST 2016 and 2.35% higher than the best system.
Keywords/Search Tags:Word Embedding, relation extraction, rich feature, dynamic extended tree, Long Short Term Memory
PDF Full Text Request
Related items