Font Size: a A A

Protein-Protein Interaction Extraction Based On Kernels And SVD

Posted on:2010-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2178360302960704Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the quantity of biomedical literatures is increasing rapidly, the huge amount of biomedical knowledge is locked in biomedical texts, biomedical text mining becames a hot issue. Protein-protein interaction (PPI) extraction as a subtask of text mining is not only helpful to build protein relation network, but also to predict protein function and design new drugs. This paper does lots of researches on protein-protein interaction extraction methods which are based on kernels and SVD, and focuses on ensemble kernel PPI extraction, multiple kernels learning PPI extraction, co-training and SVD based PPI extraction.Firstly, this paper proposed an ensemble kernel to extract PPI. This new method combines two kernels which are self-defined path kernel and feature-based linear kernel. We designed the self-defined path kernel based on the path extracted from syntax tree and taking the length and the dimension of the path into account. We used the ensemble kernel to extract PPI and obtain a good result.Secondly, this paper performed PPI extraction by multiple kernels which combined feature-based kernel, tree kernel and path kernel. Multiple kernels covered syntax, semantic, word information and more useful information for PPI extraction, and achieved better performance.Thirdly, this paper also proposed co-training to deal with the lack of labeled data which cost biomedical researchers' amounts of time to manually label. Co-training is a semi-supervised learning method which only requires a small set of labeled data and a lot of unlabeled data. Co-training needs two views, tree view and word feature view. Two classifiers from the two views learn from each other, until the two classifiers are very slightly.Finally, we adopt sorts of features such as unigram and bigram with position information to accomplete PPI extraction. We applied singular value decomposition to extract the syntax features and the semantic features for PPI extraction, comparing with other method on cross corpus, and achieved better performance.The four methods mentioned above were adopted in different situations and perform well respectively. Without considering the different situation, multiple kernal outperformed other methods. So multiple kernal can be widely used in sorts of fields and can achieve better performance.
Keywords/Search Tags:PPI, kernel, co-training, SVD
PDF Full Text Request
Related items