Predicting Protein-ligand Binding Residues With Deep Convolutional Neural Networks

Posted on:2020-12-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Cui

Full Text:PDF

GTID:2370330596468155

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Protein research has become one of the main research directions of life sciences with the completion of Human Genome Project sequencing.Ligand-binding proteins play key roles in many biological processes.Identification of protein-ligand binding residues is important in understanding the biological functions of proteins.Owing to the technical difficulties and high cost of experimental determination,processing of massive proteins requires computational methods.Existing computational methods can be roughly categorized as sequence-based or 3D-structure-based methods.The fundamental difference between the two types of methods is whether 3D-structure data are used.All these methods are based on traditional machine learning.In a series of binding residue prediction tasks,3D-structure-based methods are widely superior to sequence-based methods.However,due to the huge number of proteins with known amino acid sequence,sequence-based methods have considerable room for improvement with the development of deep learning.Therefore,the research about predicting protein-ligand binding residues with deep learning is needed.The main research work and contributions of this paper are as follows:� We propose a new approach based on deep learning for protein-ligand binding residue prediction.This method(Deepsi)uses only sequence profiles which contain seven types of features: position-specific score matrix,relative solvent accessibility,secondary structure,dihedral angle,conservation scores,residue type and position embeddings.Fully convolutional network is used in Deepsi,which enables Deepsi to process variable-length sequences.The network is mainly com-posed of different convolution layers with stacking.The extracted features are finally combined through one-by-one convolution kernels and softmax to predict whether the residues are binding residues or not.The size of the effective context scope is expanded as the number of convolutional layers increases.The longdistance dependencies between residues can be captured by the large effective context scope,and stacking several layers enables the maximum length of dependencies to be precisely controlled.The state-of-the-art ligand-binding method COACH and some of its submethods are selected as baselines.The methods are tested on a set of 151 nonredundant proteins and three extended test sets.Experiments indicate that Deepsi can be optimized effectively on the training sets and achieve good generalization on the test sets without any sampling.The improvement of MCC and precision are no less than 0.05 and 16%,respectively.� We propose another new approach based on deep learning for protein-ligand binding residue prediction according to the aggregation of binding residues.The features used in this method(i Deepsi)are the same as those used in Deepsi.Due to the aggregation,new modules are added to the network in Deepsi for extracting features from the context labels or prediction results.i Deepsi improves its parallelism of the testing by optimizing the forward propagation mechanism.Deepsi and i Deepsi share all the datasets.i Deepsi can also be optimized effectively and achieve good generalization without any sampling.Experiments show that the improvement of MCC and precision are no less than 0.07 and 19%,respectively.Without using any templates that include 3D-structure data,Deepsi and i Deepsi significantly outperform existing sequence-based and 3D-structure-based methods,including COACH.In addition,a training data augmentation method that slightly improves the performance is discussed in this study.

Keywords/Search Tags:

protein, ligand, binding residues, long-distance dependencies, deep convolutional networks

PDF Full Text Request

Related items

1	Identifying The SULPHATE ION Binding Residues In Proteins Based On SVM Algorithm
2	Identification Of Ion Ligand Binding Residues Based On Optimized Feature
3	Identification Of Protein-metal Ion Ligand Binding Sites Based On Deep Learning Algorithm
4	Study On Predicting Nucleotide-Binding Protein Using Deep Learning Approach
5	Identification Of Metal Ion Ligand Binding Residues In Proteins Based On Gbm Algorithm
6	Recognition Of Ligand-binding Sites In Proteins Based On Deep Learning
7	Prediction Of Binding Affinity Between Protein-ligand Molecules Based On Graph Neural Network
8	Computational Researches On Sequence-Based Transmembrane Protein-Ligand Binding
9	Molecular Dynamics Simulation Combined With Deep Learning To Explore The Effect Of Amino Acid Mutation Or Ligand Binding On Enzyme Activity
10	On The Prediction Of DNA-binding Proteins Only From Primary Sequences:A Deep Learning Approach