Font Size: a A A

The Expansion Of Molecular Network Related To Petal Development In Arabidopsis Thaliana And A Novel Multi-class Decomposition Strategy

Posted on:2018-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2370330566963930Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Support vector machine(SVM)which is based on statistical learning theory and follow structural risk minimization principle could effectively solve the nonlinear,curse of dimensionality,over fitting,and local minimum problems,as well as has the advantage of strong generalization ability.Protein-protein interactions(PPI)play an important role in the mechanism of biological regulation,and it is a typical and complex binary classification problems.The application of SVM is illustrated in binary classification problems based on the expansion of molecular network related to petal development based on MADS-box proteins and PPI network in Arabidopsis thaliana.In practice,multi-class problems are more common and more complex,a new strategy for decomposing multiple classes into two classes and feature selection method are presented in this paper,which greatly improves the independent prediction accuracy.The results are reported as follows.The expansion of molecular network related to petal development based on MADS-box proteins and PPI network in A.thaliana.Illustrating the regulatory mechanism of flower organ development plays an important role in evolution,development and ecology.In this article,we obtained protein-protein interaction network related to petal of A.thaliana,by integrating protein-protein interaction,subcellular localization,gene-chip and gene functional annotation databases,and building reliable predictive model of protein-protein interaction based on SVM.With proteins containing MADS-box domain as bait proteins,one-level expansion was performed in the network,and an expanded network including thirty-eight proteins and sixty-seven PPI was obtained.The gene functional annotation in DAVID database suggested that biological process of the majority of proteins was related to the regulation of flower development in the expanded network.Nineteen candidate tetrameric interactions,involving in eight genes,were derived from the expanded network.For the eight genes,none of them belonged to the ABCDE model genes,AGL16 with MADS-box domain may be the new member or the redundant gene of class B.The expanded network indicated that SEU,LUH,CHR4,CHR11,CHR17,and AT3G04960 were the candidate targets of petal AP1-AP3-PI-SEP tetramers of A.thaliana.The results provided references for deeply analyzing the molecular regulatory network related to petal development of A.thaliana.A novel strategy for decomposing multiple classes into two classes and feature selection method.Multi-class problems are more general and tougher to predict in real applications than binary classification problems.An appropriate decomposition strategy and effective feature selection method are the keys to improve the prediction accuracy of multi-class problems.By integrating the advantages of One-vs-All(OVA),One-vs-One(OVO),and hierarchical classification(HC)strategies,a new strategy called Player Killing(PK)is developed in this paper,to decompose multiple classes into two classes using two phases.The strategy also uses different training samples in feature selection and decision-making phases.The disadvantage of the conventional minimal redundancy maximal relevance(mRMR)feature selection approach is that the relevance and redundancy measurements are incomparable and feature introduction cannot be automatically terminated.In contrast,the maximum information coefficient(MIC)can make universal measurements on the linear or nonlinear correlation between two variables.Using MIC and redundancy sharing,a new universal feature selection approach with automatic termination called MIC-share is developed in this paper.According to the independent prediction results of 11 UCI multi-class datasets,MIC-share is superior to mRMR,and the PK strategy is superior to the OVA,OVO,and HC strategies.Further,using the support vector classification(SVC),the weighted average of the MIC-share-PK model is 85.42%,which is better than the results of the three traditional models mRMR-OVA(67.61%),mRMR-OVO(78.99%),and mRMR-HC(77.07%).The MIC-share-PK model has a wide range of potential multi-class applications.
Keywords/Search Tags:Support vector machine, Classification, Protein-protein interaction prediction, Strategy for decomposing multiple classes into two classes, PK strategy, MIC-share
PDF Full Text Request
Related items