Font Size: a A A

Predicting Protein-Protein Interactions By Biostatistics Methods

Posted on:2008-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:J HuFull Text:PDF
GTID:2120360212980999Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
Proteins are the primary components of the cellular machinery and it is impossible for body to work without proteins. Nowadays, the prediction of function and principle of proteins is one of the most important topics in the area of life sciences. Many proteins mediate their biological function through protein interactions, and protein interactions are crucial for many aspects of cellular biology. Firstly, genetic interactions often correlate with physical interactions between the corresponding gene products. Secondly, protein interactions are required to tether the components of signal-transduction pathways physically. Thirdly, enzyme-protein substrate interactions are important for catalysis ,and are often found to be more stable than those presumed . Last, protein interactions are crucial for the integrity of multicomponent enzymatic machines such as RNA polymerases and the SPLICEOSOME . Thus, computational prediction of protein interactions has been initiated under the assumption that identification of interaction partners for proteins of unknown function can provide insight into their biological function.Here in my work, the positive dataset is downloaded from Saccharomyces cerevisiae core subset of DIP database. Since a noninteracting protein dataset is not readily available, a hypothetical noninteracting protein dataset is generated based on subcellular localization information which is retrieved form MIPS database and consists of protein pairs that do not colocalize together. At first, with the knowledge of the amino acid sequence each protein sequence is converted into a feature vector using CTD encoding approach. A set of SVMs was trained to predict the protein interactions and the prediction accuracy averaged 79% for the ensemble of statistical experiments.After optimizing the set of parameter vectors by different strategies, the predictive accuracy obtain through 5-fold cross-validation tests is 82.43% ,about 5% higher than the literature. Then we predict protein interactions with the other four encoding approachs. All the result are better than the literature.The predictive...
Keywords/Search Tags:Biostatistics, Protein-protein interaction prediction, Database of interacting protein, Support vector machine, fusion network, a two-stage SVM
PDF Full Text Request
Related items