With the accomplishment of genome sequencing projects of human and other species,the increasing availability of genome sequencing data provides sufficient encodinginformation for hundreds of thousands of proteins. The experiments have shown thatprotein-protein interactions are critically dependent on just a few residues, and only thethese crucial residues termed hot spots contribute significantly to the binding free energyof protein-protein interactions. Hot spots play the crucial roles in various protein-proteininteraction interfaces. Therefore, hot spot prediction becomes increasingly important forwell understanding the essence of protein-protein interactions and helping narrow downthe search space for drug design.A number of computational methods have been explored for the hot spot prediction atprotein-protein interfaces. In this thesis, we present a two-step feature selection andprediction method using LS-SVM based on Bayesian inference by obtaining variousfeatures from protein sequence and structure, in which how to choose effective biometricshould be well addressed. We first extract65features from a combination of proteinssequence and structure information. Then we design a two-step feature selection methodto acquire features. During the modeling process, the hyper-parameters used in LS-SVMare selected by Bayesian framework, which optimizes the parameters of maximumposterior probability. With these optimized parameters, the established model can predicthot spots accurately. The proposed method is applied to independent test dataset.Empirical studies show that our method can yield better prediction accuracy than thosepreviously published in the literature. |