Font Size: a A A

Research On Methods Of Drug Information Extraction From Biomedical Texts

Posted on:2017-04-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y LiuFull Text:PDF
GTID:1108330503469840Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of biomedical research and Internet technologies, biomedical literatures available on the Internet are growing rapidly. Vast amounts of unstructured biomedical texts contain a wealth of valuable knowledge. As one kind of widely studied biomedical entities, drugs are important carriers of biomedical knowledge. Drug information extracted from unstructured biomedical texts not only can serve biomedical researchers and health care professionals but also can expand existing drug knowledge bases.Therefore, drug information extraction from biomedical texts is getting more and more attention. Presently, drug information extraction mainly focuses on drug name recognition and drug-drug interaction extraction. Performances of existing methods for drug information extraction can not meet the requirements of practical applications. Therefore, this thesis makes in-depth research on the two problems. The main contents of this thesis are as follows:Firstly, drug name recognition based on fusion of multiple semantic features is proposed. Semantic features based on drug name dictionaries are very helpful for recognizing drug names. Therefore, they are widely used in machine learning-based drug name recognition methods. However, it is difficult to update the drug name dictionaries immediately after new drugs are developed. Therefore, semantic features based on drug name dictionaries have some limitations. This thesis notices that large-scale unstructured biomedical literatures contain a large number of drug names not covered by drug name dictionaries.To compensate for the limitations of semantic features based on drug name dictionaries,this thesis proposes a drug name recognition method based on fusion of multiple semantic features. This method uses large-scale unstructured biomedical literatures to generate semantic features based on word embeddings. Semantic features based on drug name dictionaries and word embeddings are used together for drug name recognition. Experimental results show that drug name recognition method based on fusion of multiple semantic features can achieve better performance than methods using only one semantic feature.Secondly, drug name recognition based on feature conjunction and feature selection is proposed. Feature conjunction refers to combining different simple features into conjunction features. Compared to simple features, the advantage of conjunction features is that they can capture multiple characteristics of a word. In drug name recognition, there are a lot of ways for feature conjunction, which generate too many conjunction features.Moreover, noise in the feature set can affect the performance of machine learning models.Therefore, in addition to n-gram features, existing drug name recognition methods usually only use simple features. In order to improve the performance of drug name recognition by conjunction features, this thesis proposes a drug name recognition oriented framework for feature generation. The framework consists of a feature conjunction module and a feature selection module. The feature conjunction module combines simple features into conjunction features. The feature selection module removes noise from the feature set. This thesis combines semantic features based on word embeddings and drug dictionaries and general features into conjunction features based on the proposed framework. Generated features are used by conditional random fields for drug name recognition. Experimental results show that drug name recognition method based on feature conjunction and feature selection outperforms the methods using simple features.Thirdly, text sequence-based convolutional neural network for drug-drug interaction extraction is proposed. The state-of-the-art methods for drug-drug interaction extraction are based on support vector machine. These methods use a lot of manually defined features and require natural language processing toolkits to generate the features. Therefore,performances of these methods are significantly influenced by natural language processing toolkits. To reduce the reliance on natural language processing toolkits, this thesis proposes a text sequence-based convolutional neural network for drug-drug interaction extraction. Inputs of the proposed method are word embeddings learned by an unsupervised deep learning algorithm and randomly initialized position embeddings. The proposed method learns features by text sequence-based convolution and max pooling and extracts drug-drug interactions by the softmax classifier. Experimental results show that text sequence-based convolutional neural network outperforms support vector machine for drug-drug interaction extraction.Fourthly, dependency-based convolutional neural network for drug-drug interaction extraction is proposed. Text sequence-based convolutional neural network neglects long distance dependencies between words. However, long distance dependencies between words are very important for drug-drug interaction extraction. Therefore, this thesis proposes a dependency-based convolutional neural network for drug-drug interaction extraction. Experimental results show that long distance dependencies between words can improve the performance of drug-drug interaction extraction. Dependency parsers produce many wrong results when they parse long sentences. The wrong results can be propagated into the dependency-based convolutional neural network and affect performance of the model. To avoid error propagation, this thesis merges extraction results of the text sequence-based and dependency-based convolutional neural networks based on lengths of sentences. Experimental results show that combination of the two methods can further improve the performance of drug-drug interaction extraction.
Keywords/Search Tags:Biomedical texts, Drug information extraction, Drug name recognition, Drugdrug interaction extraction
PDF Full Text Request
Related items