Font Size: a A A

Protein-Protein Interaction Extraction Based On Combinational Learning And Active Learning

Posted on:2016-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:M J LiuFull Text:PDF
GTID:2180330461476544Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The continuous development of life science and technology has resulted in an explosive increase of the biomedical literature. Therefore, intelligent methods that can automatically extract useful information from massive literatures are urgently needed. As the Internet is becoming increasingly perfecting, information extraction technology has rapidly developed. And, it has a significant impact on biomedical research. Protein-Protein Interaction (PPI) extraction is an important application of information extraction technology in biomedical field. It is aiming at the potential knowledge mining from the molecular level. Centering on the PPI extraction problem, this paper carries out the following researches.To solve the features deficiency and the problem of limited decision-making ability brought by a single classifier, this paper proposes a combinational learning method. This approach is focusing on the features design and multi-classifier integration. In feature selection, depending on the sentence contexts and the syntactic structures, rich features are extracted to construct feature vectors, besides, the information gain method is used to screen the feature; in classifier integration, three classifiers with different decision-making schemes and higher precision are chosen and used separately, including Support Vector Machine, Maximum Entropy and Naive Bayes. Then, we integrate the classification results by adopting linear weighted method to make sure that the classifier which performed better has a larger weight. The combinational learning approach gained 71% of the F-score and 92.9% of the AUC-score on AIMed corpus.The combinational learning approach is only fit for the situation with enough labeled corpus. However, there is relatively few labeled corpus in practice. Thus, in order to solve this problem, this paper proposes an active learning on the basis of combinational learning. This method is based on the selection approach of the uncertain samples. It repeatedly selects the samples with most useful information from large amounts of unlabeled corpus and ignores the useless ones. Then, annotate the selected samples and add them to the original training set. The active learning method can not only achieve a better PPI extraction performance, but also reduce the hand-annotated work. The experiments of active learning also achieve higher AUC-scores on the other four corpuora except for LLL this approach shows a better generalization performance in large corpora as AIMed and BioInfer.
Keywords/Search Tags:Protein-Protein Interaction Extraction, Rich Feature, Feature Selection, Combinational Learning, Active Learning
PDF Full Text Request
Related items