Font Size: a A A

Prediction Of Protein-nucleic Acid Interface Hot Spots

Posted on:2021-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:S J ZhangFull Text:PDF
GTID:2370330620465636Subject:Biology
Abstract/Summary:PDF Full Text Request
Protein and nucleic acid are two kinds of macromolecules,the interactions of them participate adequately in the development and metabolism of biological processes.On the one hand,we can study their interactions by observing the changes of protein conformation in threedimensional structures.On the other hand,it is helpful to analyze the binding energy changes when the amino acid molecule interacts with the nucleotide.Numerous studies have shown that on the protein-nucleic acid interface,a small number of residues is critical to binding free energy which are called Hot Spots.As a major contributor in binding free energy of protein-nucleic acid interactions,hot spot residues are of great interest to researchers.Over the past few years,computational methods have been developed to identify hot spots in protein-RNA and protein-DNA complexes.Among the protein-RNA hot spot prediction methods,we find that most of the existing methods rely on prior knowledge of protein structures.However,as the number of protein sequences is much larger than that of structures,there is still a need to develop sequence-based method to predict hot spot residues from protein sequences.For protein-DNA hot spot predictors,most of them are developed by using molecular dynamics simulation approach.This kind of computational method is inefficient and unstable when dealing with large samples.Considering these aspects,we carried out the following two works:1.We propose a sequence-based protein-RNA hot spot prediction method.We make protein sequences with encoding of pseudo-amino acid composition,which takes the effects of adjacent residues into account.It is able to extract information from the whole sequence to reveal the interaction pattern of the protein and the nucleic acid.We first use physical and chemical properties to quantize protein sequences,and then we apply pseudo-amino acid composition encoding scheme to extract feature vector with the same dimension for each sequence.Combined with the relative solvent accessible surface area and amino acid substitution matrix features,we use three classifiers(radial basis based support vector machine,sigmoid based support vector machine and K-nearest neighbor)to construct the final ensemble model.The performance of our method is F1=0.843,MCC=0.657,and AUC=0.893.Compared with other existing tools,our method produces better prediction results.2.We propose a structure-based protein-DNA hot spot prediction method.Considering that the existing protein-DNA hot spot prediction methods cannot be used on a large scale,we use machine learning strategy to construct a prediction tool based on a variety of biological features.First,we collect four types of features,including solvent accessible surface area,structure,sequence and network features,with a total of 114 dimensions.The random forest variable selection method is used to select the optimal feature subset with 10 dimensions,and the support vector machine is applied to build the final model.Our method achieves the performance with F1=0.721,MCC=0.531,and AUC=0.803.Comparison with the state-of-art methods shows that our prediction tool can accurately and efficiently identify hot spot residues on the protein-DNA interface.Comparing with existing protein-RNA and protein-DNA hot spot predictors,the two methods we proposed exert better prediction performance.Our methods will not only provide valuable insight into the principles governing protein-nucleic acid interaction,but also help to narrow down the search space for drug design.
Keywords/Search Tags:Hot spots, Protein-nucleic acid interaction, Protein sequence, Protein structure, Machine learning
PDF Full Text Request
Related items