Font Size: a A A

Prediction Of The Protein-Nucleic Acids Interaction Sites Based On Evolutionary Conservation And Structural Similarity

Posted on:2013-08-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:T LiFull Text:PDF
GTID:1220330398496402Subject:Biophysics
Abstract/Summary:PDF Full Text Request
With the accomplishment of genome sequencing projects of human and other species, the study of biology has been gradually transferred from the genomics era to the post-genomics era. One of the main objectives in the post-genomics era is to elucidate the mechanism of interaction between the big biological molecules. In biological cells, as the production of genetic information, proteins are the carriers of the most important biological activities and the executors of cellar functions, proteins perform specific functions when they interact with other molecules. The interaction between protein and nucleic acid (DNA/RNA) as well as the relationship between protein structure and function are the core content in the post-genomics era. Moreover, the prediction of DNA/RNA-binding sites can not only provide foundation in these researches but also find the essence of life activitie. Meanwhile, the prediction of DNA/RNA-binding sites is crucial in structure-based drug design (SBDD), significant in computer-aided drug design (CADD) and the hot issues in bioinformatics. However, only a part of residues in proteins are directly participating the interaction with nucleic acid molecules. The interacting residues play crucial roles in various biological functions of proteins. Therefore, the characterization and identification of functional residues or binding sites provides important clues for exploring the function of proteins.In the past couple of years, several bioinformatics methods for identifying DNA/RNA-binding sites had been developed. Especially, the machine learning-based methods are applied to the prediction of binding residues from sequence or structure-derived features. In this paper, based on evolutionary information, solvent accessible surface area and torsion angles (φ,Ψ) in the backbone structure of the polypeptide chain, a support vector machine algorithm is developed for annotating the protein-RNA interaction sites in proteins. Then, we apply the SVM-based method to predict DNA-binding residues. More impoartantly, based on the analysis of the structural and topological features of interaction between protein and RNA, we explore and design effective characterize parameters for presenting their interaction and predict DNA-binding residues in proteins. Then, we have developed a strategy to predict DNA-binding sites by integrating structural-based method and SVM-based methods. The outline of the research topies is listed as following:1. A computational approach was proposed to annotate protein-RNA interaction sites in proteins by combining new feature parameters with support vector machine. By analyzing the physicochemical attributes related to the different types of proteins for binding RNA molecules, we have found that the protein backbone structure (PBS) of the polypeptide chain can reflect the difference between binding sites and non-binding sites, besides the well known physicochemical attributes, evolutionary information and solvent accessible surface area. Then, we introduce a novel weighting factor to quantify the distance-dependent contribution of each neighboring residue in determining the location of a binding residue. Finally, we combined the three features above and support vector machine to predict RNA-binding sites. Evaluation experiments results by using5-fold cross validation on the training set and on the independent test demonstrate that PBS play more important roles in RNA-binding sites prediction and the combination of three features is conducive to improve the results. Compared with the conventional methods, the results of our approach are improved obviously.2. Our annotated results for RNA-binding residues are compared to annotation of the PDB. The predicted results show that the more predicted binding residues, they are in reasonable agreement with experimental data, are obtained by our method. And some binding patterns can be predicted by our method, For example, four residues Thr111, Ser113, Ser120, and Thrl22in protein1RPU_A concurrently interact with RNA. The ability for recognization RNA binding patterns is distinct from other models.3. Based on our previous work, an improved method was developed to predict DNA-binding sites in proteins. In the section, we first build a new dataset, which consist of224DNA-binding proteins. Then, we introduced a novel structural aligment algorithm to calculate the geometric decision value for each site based on the amino acid-nucleic acid geometric structure information. The final result for each site is obtained by combining the two decision values of SVM-based predictor and geometric structure-based predictor. In addition, it predicts DNA-binding sites with85.06%sensitivity and85.33%specificity when tested on a dataset with62protein-DNA complexes. Both the sensitivity and specificity of our predictor are also better than those of other predictors.4. Based on deep and comprehensive analysis of the results for the common PDNA-62dataset, we found that some false positive sites, with more then90%confidence value in our results, are located in DNA binding region and have other biological function unrelated to DNA binding. We also found that some false negatives sites, with less than10%confidence value, are located in the beginning or end of a-helices or β-strands. In fact, as alluded to earlier research, the residues with this nature are difficult to accurately identify because the located region are usually distorted randomly. The identification of these functional sites is the focus of our future work.
Keywords/Search Tags:Protein-Nucleic
PDF Full Text Request
Related items