Biomedical Entity Relation Extraction Based On Semi-supervised Learning And Deep Learning

Posted on:2017-06-22

Degree:Master

Type:Thesis

Country:China

Candidate:Q L Feng

Full Text:PDF

GTID:2348330488959716

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the rapid growth of biomedical literature, the technology of information extraction (IE) in the biological literature has been studied extensively. Until now, most of the works in information extraction for biomedical literature are related to relation extraction. In the biomedical domain, relation extraction mainly focuses on recognizing the biomedical entities name (proteins, drugs, diseases, genes, etc.) and extracting the semantic relations between entities. This thesis focuses on studying the relations between disease-symptom, symptom-therapeutic substance and protein-protein, and proposes a semi-supervised learning and deep learning methods to solve the problem that are lack of labeled data and manual feature construction in entities relations extraction, respectively.To solve the problem of the lack of labeled data in extracting relations among disease and symptom, and symptom and therapeutic substance, this thesis proposes two semi-supervised learning algorithms, Co-Training and Tri-Training, to construct the disease-symptom model and symptom-therapeutic substance model. In the training process, the feature kernel, graph kernel and tree kernel are used as input views of Co-Training and Tri-Training methods. In the Tri-Training method, we use ensemble learning to integrate several classifiers. Experimental results show that Co-Training and Tri-Training algorithms can both utilize the unlabeled data along with a few labeled examples to improve the classification performance. In addition, the performance of Tri-Training outperforms Co-Training in the experiment.Using semi-supervised learning methods for relation extraction of the disease-symptom and symptom-therapeutic substance, requires large scale of manual features, the quality of these features have direct impact on the experimental results. Moreover, the construction of a large number of features is time-consuming and laborious. To solve this problem, this thesis leverages a convolutional neural network method for relation extraction of disease-symptom and symptom-therapeutic substance. This method can automatically learn features from the corpus and acquire a feature hierarchy, which reduces the cost of manual feature construction. Meanwhile, this paper uses the Tri-Training method to expand the corpus. Experimental results show that, compared with Tri-Training, convolutional neural network method can obtain a better result.There are two problems in relation extraction based on semi-supervised learning. On the one hand, semi-supervised learning choses unlabeled data which are labeled consistent by classifiers, this method may lose some information. One the other hand, when unlabeled data are added to the training set, these samples may be labelled improperly. To solve these two problems, this paper proposes an improved tri-training method for protein-protein interaction extraction (PPIE). This method chooses unlabeled data which is labeled inconsistent by three classifiers and uses active learning method to label these unlabeled data. Experimental results show that, compared with other methods, this method can achieve better performance with 68.80% F-score on the AIMED corpus.

Keywords/Search Tags:

Information Extraction, Semi-supervised Learning, Unlabeled data, Convolutional Neural Network, Active Learning

PDF Full Text Request

Related items

1	Semi-supervised Learning Based On Information Theory And Functional Dependency Rules Of Probability
2	Based On The Positive And Unlabeled Samples, Semi-supervised Classification
3	Research On Semi-Supervised Support Vector Machine Learning Methods
4	Combining Semi-supervised And Active Learning For Image Retrieval
5	Research On Text Clustering Based On Semi-supervised Learning
6	Research On Semi-supervised Classification Algorithm Based On Integrated Neural Network
7	Research On The Semi-supervised Few-shot Classification Based On Siamese Network And GMM
8	Research And Application Of Convolutional Neural Network In Collaborative Semi-Supervised Classification
9	The Analysis Of The Method For Modulation Recognition Based On Semi-supervised Learning
10	Unlabeled Data Aided Deep Learning Techniques Researches