Font Size: a A A

Predicting Protein-ligand Binding Residues With Deep Convolutional Neural Networks

Posted on:2020-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y F CuiFull Text:PDF
GTID:2370330596468155Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein research has become one of the main research directions of life sciences with the completion of Human Genome Project sequencing.Ligand-binding proteins play key roles in many biological processes.Identification of protein-ligand binding residues is important in understanding the biological functions of proteins.Owing to the technical difficulties and high cost of experimental determination,processing of massive proteins requires computational methods.Existing computational methods can be roughly categorized as sequence-based or 3D-structure-based methods.The fundamental difference between the two types of methods is whether 3D-structure data are used.All these methods are based on traditional machine learning.In a series of binding residue prediction tasks,3D-structure-based methods are widely superior to sequence-based methods.However,due to the huge number of proteins with known amino acid sequence,sequence-based methods have considerable room for improvement with the development of deep learning.Therefore,the research about predicting protein-ligand binding residues with deep learning is needed.The main research work and contributions of this paper are as follows:· We propose a new approach based on deep learning for protein-ligand binding residue prediction.This method(Deepsi)uses only sequence profiles which contain seven types of features: position-specific score matrix,relative solvent accessibility,secondary structure,dihedral angle,conservation scores,residue type and position embeddings.Fully convolutional network is used in Deepsi,which enables Deepsi to process variable-length sequences.The network is mainly com-posed of different convolution layers with stacking.The extracted features are finally combined through one-by-one convolution kernels and softmax to predict whether the residues are binding residues or not.The size of the effective context scope is expanded as the number of convolutional layers increases.The longdistance dependencies between residues can be captured by the large effective context scope,and stacking several layers enables the maximum length of dependencies to be precisely controlled.The state-of-the-art ligand-binding method COACH and some of its submethods are selected as baselines.The methods are tested on a set of 151 nonredundant proteins and three extended test sets.Experiments indicate that Deepsi can be optimized effectively on the training sets and achieve good generalization on the test sets without any sampling.The improvement of MCC and precision are no less than 0.05 and 16%,respectively.· We propose another new approach based on deep learning for protein-ligand binding residue prediction according to the aggregation of binding residues.The features used in this method(i Deepsi)are the same as those used in Deepsi.Due to the aggregation,new modules are added to the network in Deepsi for extracting features from the context labels or prediction results.i Deepsi improves its parallelism of the testing by optimizing the forward propagation mechanism.Deepsi and i Deepsi share all the datasets.i Deepsi can also be optimized effectively and achieve good generalization without any sampling.Experiments show that the improvement of MCC and precision are no less than 0.07 and 19%,respectively.Without using any templates that include 3D-structure data,Deepsi and i Deepsi significantly outperform existing sequence-based and 3D-structure-based methods,including COACH.In addition,a training data augmentation method that slightly improves the performance is discussed in this study.
Keywords/Search Tags:protein, ligand, binding residues, long-distance dependencies, deep convolutional networks
PDF Full Text Request
Related items