Font Size: a A A

Study On Predicting Method Of Subcellular Distribution Of Pathogenic Gene And Bacterial Protein

Posted on:2017-04-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhuFull Text:PDF
GTID:1104330485463071Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Identification of gene-phenotype relationships is the core purpose in molecular biology. Preliminary findings indicate that genes causing the same or similar phenotypes tend to correlate with each other in the protein-protein interaction network, a lot of network-based approaches were proposed based on different underlying models. A recent comparative study showed that diffusion-based methods achieve the state-of-the-art predictive performance.In this dissertation, a new diffusion-based method was proposed to prioritize candidate disease genes. Diffusion profile of a disease was defined as the stationary distribution of candidate genes given a random walk with restart where similarities between phenotypes are incorporated. Then, candidate disease genes are prioritized by comparing their diffusion profiles with that of the disease. Finally, the effectiveness of our method was demonstrated through the leave-one-out cross-validation against control genes from artificial linkage intervals and randomly chosen genes. Comparative study showed that our method achieves improved performance compared to some classical diffusion-based methods. To further illustrate our method, we used our algorithm to predict new causing genes of 16 multifactorial diseases including Prostate cancer and Alzheimer’s disease, and the top predictions were in good consistent with literature reports. Our study indicates that integration of multiple information sources, especially the phenotype similarity profile data, and introduction of global similarity measure between disease and gene diffusion profiles are helpful for prioritizing candidate disease genes.Information on the subcellular localization of bacterial proteins is essential for protein function prediction, genome annotation and drug design. Here we proposed a novel approach to predict the subcellular localization of bacterial proteins by fusing features from position-specific score matrix(PSSM), Gene Ontology(GO) and PROFEAT. A backward feature selection approach by linear kennel of SVM was then used to rank the integrated feature vectors and extract optimal features. Finally, SVM was applied for predicting protein subcellular locations based on these optimal features. To validate the performance of our method, we employed jackknife cross-validation tests on three low similarity datasets, i.e., M638, Gneg1456 and Gpos523. The overall accuracies of 94.98%, 93.21%, and 94.57% were achieved for these three datasets, which are higher(from 1.8% to 10.9%) than those by state-of-the-art tools. Comparison results suggest that our method could serve as a very useful vehicle for expediting the prediction of bacterial protein subcellular localization.
Keywords/Search Tags:Disease gene prediction, Random walker, Protein-protein interaction network, Support vector machine, Prediction of bacterial protein subcellular localization
PDF Full Text Request
Related items