The Protein-Protein Interaction Based On Transfer Learning And Word Representation

Posted on:2016-08-27

Degree:Master

Type:Thesis

Country:China

Candidate:R Guo

Full Text:PDF

GTID:2180330461478630

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As a fundamental part of biomedical text mining technology, Protein-Protein Interaction (PPI) extraction has great research significance and application value, and has received increasing attention by researchers in recent years. The current research on PPI generally adapts the statistical machine learning method, and has achieved acceptable results. However, the current methods still suffers from two difficult questions:one is the lack of the annotated data; the other is vocabulary gap and data sparseness in feature expression. Firstly, the insufficiency of the annotated data will lead to lower efficiency, and the manual data-tagging usually requires large and expensive experiments; then, the One-Hot encoding, which is widely used in traditional machine learning methods on PPIe in feature representation, omits the word-order and semantics information, unable to express the latent relative information, limiting the performance on PPI.To address the above problems, this paper conducts the research in the following two aspects:(1) We introduce the transfer learning method to solve the problem of annotated data insufficiency, and propose an improved algorithm which is called "DisTrAdaboost" to avoid "negative transfer". In order to overcome the lack of training data, we introduce the instance-based transfer learning method to boost the performance on PPI extraction. Due to the distribution variance between data fields, the current TrAdaboost algorithm is too slow to converge. In contrast, our DisTrAdaboost algorithm can accelerate convergence by adjusting the initial weight according to the relative distribution. In our experiment, both DisTrAdaboost and TrAdaboost algorithms have achieved good performance on AIMed corpus; when the same experiment is performed on IEPA, TrAdaboost falls into "negative transfer", while DisTrAdaboost keeps transfer efficiency.(2) We propose an word representation approach on feature representation to overcome the "data sparseness" and "vocabulary gap" problems. In this paper, we employ an unsupervised word representation approach to learn the latent sematic information from the large annotated data. Then each word is mapped as a real-valued vector or divided into a category based on the sematic information, making that the similar words share similar distribution, and the two problems can be solved. In our experiment, we employ three word representation methods, including:distributed representation, vector clustering representation, brown clustering representation. The effects of the three above methods are compared on PPI extraction task. Experimental result shows that the distributed representation method make great improvement on five public PPI corpora:AIMed, Biolnfer, HPRD50, IEPA and LLL, which performs much better than the two clustering-based representation methods, achieving the F-scores of 69.7% 74.0%,78.0%,76.5% �'� 87.3%, that is better than other state-of-art methods.

Keywords/Search Tags:

Protein-Protein Interaction, negative transfer, transfer learning, datasparseness, vocabulary gap, word representation

PDF Full Text Request

Related items

1	Research On Protein-protein Interactions Extraction Methods Based On Biomedical Text Mining
2	Prediction Of Protein-protein Interactions Based On Wavelet Transform And Ensemble Learning
3	Interaction Analysis Of PUF Protein And Its Target By FRET With Engineered ECFP And FAM-Labeled Oligonucleotides
4	Research On Identification And Application Of Protein Complexes In Protein-Protein Interaction Networks
5	Weakly Supervised Protein-protein Interaction Identification Based On Complex Network And Graph Embedding Representation
6	Research On Predicting Protein-protein Interactions Based On Machine Learning
7	Study On The Mechanism Of Extracellular Electron Transfer In Electro-producing Microorganisms Based On The Interaction Of Cytochrome C
8	Research Of Protein-protein Interactions Prediction Based On Deep Learning
9	Predicting Large-scale Protein-protein Interactions Based On Sparse Representation Based Classifier Model
10	Investigating Protein Interaction Network And The Function Of Electron Transfer For Electricigens