Font Size: a A A

Protein-Protein Interaction Extraction Using Combined Kernel And Active Learning

Posted on:2011-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:J Y PingFull Text:PDF
GTID:2178330332960908Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of time, digital information has become an international trend. A large number of digital information is generated today. Especially in the biomedical field, scientific literature is growing with an exponential speed. It becomes more and more difficult to find interesting information from large scale of literatures. How to auto-extract interesting information from kinds of literatures becomes a hot issue of current study.As an important branch of biomedical information extraction, protein-protein interaction (PPI) auto-extraction, gaining more and more attention from biomedical researchers. This thesis devotes itself to auto-extraction of protein interaction form biomedical literatures.Firstly, this thesis applies Support Vector Machine (SVM) with a combined kernel to extract of protein interaction. Secondly, this thesis combines SVM with combined kernel and active learning method for extraction of PPI.Machine learning method includes feature-based machine learning and kernel-based machine learning method. Feature-based machine learning method just use the information of single words but syntax and semantic information of sentences, while kernel-based machine method is able to overcome this shortage and can capture the structure information of a sentence. In order to acquire more rich and useful information including words feature, distance features and other structure information, we combine both of these two methods and construct a combined kernel.we also attempt to define a kernel with a more effective matching algorithm. This thesis uses these combined kernels to extract PPI. With these combined kernels, we can not only use words and distance features of sentences, but also syntax features of sentences. The experimental results are higher on IEPA corpus and Aimed corpus with the combined kernel. The f-score is 77.2% on IEPA corpus and 69.64% on Aimed corpus. Experiment result shows that using combined kernel is efficient for PPI extraction.Traditional machine learning is always supervised machine learning method; it's usually trained on large scale of manual-tagged corpora. But it is very difficult to acquire needed corpus. Therefore we applies active learning method to extract PPI in this thesis. SVM with combined kernel is used as the learner of active learning method. Experiment shows that the method this thesis proposed can not only reduce the scale of training corpus and improve the performance of PPI extraction system in the same corpus, but also get the same performance in different corpora.
Keywords/Search Tags:PPI, SVM, Combined Kernel, Active Learning
PDF Full Text Request
Related items