Font Size: a A A

Predicting Hot Spots In Protein Interfaces Based On Feature Selection Using MRMR Combing With SVM-forward And Its Biological Application

Posted on:2018-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ChenFull Text:PDF
GTID:2370330512494296Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Research in biology has basically divided into genomics era and post-genomics era in recent years,however,the post-genomics era has higher concern about proteins with genetic engineering finishing.Protein play life activity at the cellular level through protein-protein interaction,such as DNA replication DNA transcription,signal transduction,regulatory mechanisms and gene translation and so on,thus,protein-protein interaction becomes one of the most important research areas in post-genomics era.The previous studies showed that very few residues released large amounts of energy during protein binding found in the protein-protein interaction network and these residues contribute significantly to binding free energy of the protein-protein interaction,then the researchers named these residues as hot spots.Hot Spots is a small cluster of residues on the protein interface,they are not evenly distributed in protein-protein interaction interface,however,they are satisfied with the aggregate distribution.Hot spots only occupies very small area on the protein surface,but they play a crucial role in the free binding of proteins.They also play important role to keep protein functions and stability of protein-protein interaction.There is a biotechnology to detect hot spots,however,this method has complicated operations and high economic costs,also consuming too much time.Therefore,there are also methods to predict protein surface hot spots,mainly including prediction based on empirical formula and machine learning.Those methods have already achieved some successes in predicting hot spots in protein-protein interaction.Unfortunately,there needs more improvement.In this paper,we mentioned a method based on machine learning to predict hot spots in protein.We extracted 143 features from protein sequence,structure and interaction based on previous studies and our new features.Then,we used minimal redundancy maximal relevance(mRMR)combing with Support Vector Machine(SVM)forward to select features.An optimal 41-dimensional features were selected and then applied to construct a Random Forest(RF)predict model for hot spots in machine learning method after features analysis.Our model called HPcms obtained the highest F1 0.625 and Mathew's correlation coefficient(MCC)value of 0.518 testing on independent test set comparing previous studies.And new features in this work also showed an importance position after feature selection.Finally,we used our method to predict hot spots in epitope in antigen surface of antibody-antigen interaction in biological application,then we used Multigraft module in an open source named Rosetta to transplant epitope of 3ztn.pdb from our laboratory to other protein.We did biology experiments after selecting simulated result by computer.Our method is helpful to predict hot spots in epitope and epitope transplantation.
Keywords/Search Tags:Protein-Protein Interaction, Hot Spots, Epitope Transplantation
PDF Full Text Request
Related items