Font Size: a A A

Research On Drug Entity Extraction In Biomedical Literature

Posted on:2017-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:J Z BuFull Text:PDF
GTID:2348330533469229Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,more and more achievements have been published and the number of textual data was increasing exponentially,such as scientific literatures and patents.There are many chemical compounds and drug related information such as targeting and combining relations,the metabolism,enzymatic reflection,the potential side-effects,treatment potential,etc.This information is stored in the form of unstructured data.How to obtain and utilize the information contained in the text is of great significance to the related research and application.In order to achieve this goal,the first problem needed to be solved is how to efficiently extract the chemical and drug name s from massive unstructured text data.To this end,this paper focuses on the research of chemical and drug entity extraction methods in biomedical literature.The main contents of this study include the following three aspects: firstly,we built a rich and effective feature set according to the characteristics of chemical and drug entity and implemented the method based domain features by conditional random fields and structured support vector machines.Based on analysis of the characteristics of the chemical and drug entity,we built an efficient feature set by contrast tests.The feature set in this work include not only basic domain features but also word representation features.On the test set of Bio Creative V CEMP task,the F1-socre of CRFs-based system achieved 0.8704 and SSVM-based system achieved 0.8761.Secondly,we studied the method of chemical and drug entity extraction based on deep learning.The performance of traditional machine learning algorithms has a lot to do with features.As a machine learning method which could learn features automatically,the deep learning is suitable for many different fields.This paper adopted the sequential structure of recurrent neural networks to extract chemical and drug entity,the F1-socre of system which base on recurrent neural networks with conditional random fields layer achieved 0.8876,is better than CRFs-based system and standard RNNs-based system.Finally,this paper integrated systems which based on the methods of domain feature and deep learning by stacked generalization and implemented the method of chemical and drug entity extraction by stacked generalization.We built primary learners by using methods of domain feature and deep learning.By analyzing the results of primary learners,we selected a feature set which represent the diversity and consistency between them and adopted liner-SVM to build meta-learner.The F1-socre of system based stacked generalization achieved 0.8906.In this paper,we study and implement the methods of chemical and drug entity extraction in biomedical literature.The experimental results showed that our method could extract chemical and drug entity from unstructured text data efficiently...
Keywords/Search Tags:chemical and drug entity extraction, conditional random fields, structured support vector machines, deep learning, ensemble learning
PDF Full Text Request
Related items