Font Size: a A A

Feature Extraction And Learning Algorithm For Protein-ligand Binding Sites Prediction

Posted on:2019-12-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J HuFull Text:PDF
GTID:1360330575479564Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Protein-ligand interactions are indispensable for biological activities and play important roles in virtually all biological processes.The interaction between proteins and ligands is dominated by several ligand-binding residues,which are called protein-ligand binding sites.Hence,accurately identifying the protein-ligand binding sites is of significant importance for protein function analysis,the relationship of biological molecules,and drug design.Since it is high-cost and time-consuming to identify protein-ligand binding sites using the biological experiments,it is urgent to predict protein-ligand binding sites with a simple and effective computational method.Hence,computationally predicting protein-ligand binding sites has become a hot research point in the bioinformatics field.Because the interactions between proteins and ligands are complexity and diversity,protein-ligand binding sites prediction,especially from protein sequences directly,is still a challenging problem.Focusing on sequence-based protein-ligand binding sites prediction problem,this paper is made on the applications of machine learning algorithms in protein-ligand binding sites prediction.After summarizing the existing computational methods,three essential scientific problems,i.e.class imbalance learning,feature learning,and learning with massive and consistent growth of protein data,to be solved are put forward.Aiming to deal with these three problems,several solutions are proposed in this paper,which are employed to improve the performance of protein-ligand binding sites prediction.The main researched works in this paper are summarized as follows:(1)Two sampling learning based methods,i.e.,random under-sampling(RUS)based support vector machine(SVM)ensembled method(RUS-SVMs)and supervised over-sampling based SVM method,are proposed for processing the problem of class imbalance learning in protein-ligand binding sites prediction.Based on the two methods,three protein-ligand binding sites predictors,i.e.,TargetATP,Targets,and TargetSOS,are developed for academic communication and actual application.Experimental results on benchmark datasets have demonstrated that the proposed methods are effective to deal with class imbalance learning problem,and the corresponding predictors outperform the state-of-the-art methods.(2)In order to enhance the performance of protein-ligand binding sites prediction,the centered linear kernel target alignment(CLKTA)is proposed to fuse the different view features;the sparse representation algorithm is employed to extract more discriminative feature from the evolution information of protein;the integration of three feature selection methods,i.e.,joint laplacian feature weights learning,Fisher-score-based feature selection algorithm,and Laplacian-score-based feature selection algorithm,is used to learn more discriminative feature information from the original feature space.Based on the three feature learning methods,three different ligand-binding sites predictors are designed and their web-servers are also established for academic communication and actual application.(3)To solve the prediction of protein-ligand binding sites under the protein big-data era,three different level query-driven methods,i.e.,sequence homology level,sample similarity level,and sequence profile similarity level,are proposed to learn the knowledge of all protein data,respectively.The query-driven means that training an especial prediction model for predicting each query residue/protein.Based on these query-driven methods,three different predictors,i.e.,OSML,TargetNUCs,and TargetLBS,are developed and their web-servers are established for academic communication and actual application.
Keywords/Search Tags:sequence-based prediction, class imbalance learning, feature learning, protein big-data era, query-driven
PDF Full Text Request
Related items