Font Size: a A A

Prediction Of Protein Roles In Signaling Pathways Using Support Vector Machine

Posted on:2016-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:J Q LiangFull Text:PDF
GTID:2180330479493920Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Signal transduction is an important part of cellular activities, which not only carries a variety of biological functions, but also has a close relationship with the occurrence of multiple known diseases. Therefore, studies on signaling pathways can be of practical significance. In recent years, prediction of signaling pathways have been achieved with various degrees of prediction accuracy by using scoring methods based on different protein features, but few studies can provide a method of determining the specific role proteins play in the signaling pathways.Based on the protein sequence data contained in different roles of multiple species in 18 kinds of signaling pathways, utilizing different protein features to encode the vectors, this paper proposes two methods using support vector machine(SVM) to predict the specific roles proteins play in different signaling pathways.First of all, this paper uses KGML description files from K EGG database and K EGG API services to obtain the proteins of all roles in signaling pathways that participate in the experiment, and then constructs the basic model signaling pathway corresponding to each type of signaling pathway. Afterwards, four protein features(protein similarity, subcellular localization, transmembrane topology and signal peptide) are utilized to construct feature vectors, and then II-class and multi-class SVM prediction methods are designed based on SVM, the training and test sets are constructed for the two methods, and encoded data are used for training and prediction. Finally the experiment results are assessed and analyzed.For the II-class SVM prediction method, each role of each signaling pathway is assigned a classifier. This prediction method is divided into two types : for the first one, the value of subcellular localization in the protein feature vectors is positive real numbers, while that for the second takes bit values. For the multi-class SVM prediction method, each signaling pathway is assigned a classifier, and its role number is set as the class label of sample data. Experimental results show that under a certain loss of recall rate, the second II-class SVM prediction method improves the precision and total F- measure, and also gets a better classification results compared with the first one. In addition, multi-class SVM prediction method produces fewer classifiers, but gets a lower precision than the second II-class SVM prediction method, and cannot predict multiple roles of one protein. Therefore, considering experimental results and complexity, the second II-class SVM prediction method is more suitable for predicting the specific roles of proteins play in the signaling pathways.
Keywords/Search Tags:signaling pathway, role, protein sequence, support vector machine
PDF Full Text Request
Related items