Font Size: a A A

Biomedical Relation Extraction Based On Word Embedding And Deep Learning

Posted on:2017-07-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z C JiangFull Text:PDF
GTID:1318330512461463Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein-Protein Interaction Extraction and Drug-Drug Interaction Extraction are significant for the biomedical database construction, life sciences research, drug development and medical care. A number of biomedical relation extraction studies focus on the selection of feature set and design of kernel function, which leave little room for improvement. To further improve the performance of biomedical relation extraction, this thesis explores novel methods based on word embedding and deep learning techniques. By applying deep learning techniques, deep models are built to improve the performance of relation extraction, and word embeddings can capture semantic information, which are essential for deep learning. The main original works include:Considering the characteristics of biomedical text, incorporate word, POS tag, stem, syntactic chunk and biomedical entities into a novel word representation model to improve the semantic representation of word vectors. Compare different word representation models by conducting experiments on biomedical named entity recognition, event trigger word recognition, protein-protein interaction extraction and drug-drug interaction extraction tasks, to verify the effectiveness of the novel word representation model, which offers a good foundation for deep learning based relation extraction methods.For Protein-Protein Interaction Extraction, this thesis proposes an instance representation based method, which contains three components:word representation, skeleton features and vector composition. This method considers the characteristics of PPI instances, use of word vectors as input, cooperated with skeleton features and vector composition, to bring the semantic representation capacity of vectors into instance representation model. The experimental results show that word vector and deep learning are effective for PPIE.This thesis also proposes a two-stage Drug-Drug Interaction Extraction method. In the first stage, it identifies the postive DDI instances using feature based logistic regression classifier, and in the second stage, an LSTM based classifier is used to classify the positive instances into four DDI types. To make full use of the representation capacity of LSTM, it considers the importance, implementation cost and computational cost of many related factors in the second stage, and the experimental results show that the factors that improve the second stage performance are:word vector, distance vector, POS vector and two-layer bidirectional LSTM, which are also important for the whole two-stage method.In summary, this thesis aims to solve protein-protein binary interaction extraction problem of proteins and drug-drug multi-class interaction extraction problem by using word embedding and deep learning techniques, and the proposed models overcome the limitations of feature-based and kernel-based methods, achieving state-of-the-art performance. Word embedding and deep learning have been research focuses in recent years, however, started late in the field of biomedical text mining. This work make some achievements on biomedical relation extraction, and reveal that biomedical text mining based on word representation and deep learning has a wide broad prospects which worth being further studied in the future works.
Keywords/Search Tags:Word Embedding, Deep Learning, Protein-Protein Interaction Extraction, Drug Drug Interaction Extraction
PDF Full Text Request
Related items