Research On Protein-protein Binding Sites Prediction Method Based On Sequence Information

Posted on:2019-08-07

Degree:Master

Type:Thesis

Country:China

Candidate:W He

Full Text:PDF

GTID:2370330566998321

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The organisms rely on the interaction of proteins with proteins or other substances to accomplish various life activities.Studies on protein interactions are of major importance in understanding the mechanisms of activities in organisms,and there are also extensive theory and application prospects.In this paper,we mainly research the method of protein binding site prediction based on machine learning theory.The basic principle is extracting and combining various types of protein-related information and representing the amino acid sequence with appropriate eigenvector.And then use the scientific classification algorithm based on these characteristics to determine the amino acid category.In this paper,we research the method of sequence feature extraction based on long and short term memory network,and improve the original algorithm by analyzing the biological background theory of the problem.In addition,from the perspective of multi-feature learning,this paper also attempts to use the different types of information to construct an integrated learning model with two layers structures,so that the prediction effect is improved.This paper introduces a method of sequence information extraction based on improved long short term memory network model.The specific improvement of this method is reproduced below.First,in order to reflect the clustering characteristic of the protein binding site distribution,the output layer of the network is connected to the input layer of the next-time step,thereby the category information of adjacent residues of the target amino acid is introduced into the network.On the other hand,in order to solve the irrationality of appointing the order of protein sequences artificially,this paper train two independent prediction models by modifying the training process of the model.In this method,amino acid sequences data are scanned in two directions: forward and backward,and are utilized to train the network models respectively.And then the weighted results of the two classifiers are used as the final classification basis.Finally,the effectiveness of the algorithm is verified by the comparison experiment and the corresponding result analysis.Due to amino acid residues in the protein chains have various types of physical and chemical properties,primary structure and spatial structure.This paper introduces an integrated learning model with multiple types of features to represent the amino acid sequence more effectively.The model is divided into two layers.The first layer consists of three base classifiers,which utilize the position specific scoring matrix,Bi-gram and pseudo-amino acid as the features respectively.By dividing the data set,the training of each base classifier and the prediction of all samples are done using a strategy similar to cross validation.Next,the prediction result of the base classifier is combined with the sequence feature extracted by the improved long short term memory network model in the previous chapter,and then used together as the eigenvector of the second layer,and complete the final classification.In this paper,experiments are carried out on the three data set groups which are divided according to the sequence alignment results.Finally,this paper analyzes the relationship between the prediction performance of the base classifiers and the integrated learning classifier and also the relevant parameters of the model,and then compares the result with the previous methods to verify the effectiveness of the proposed method.

Keywords/Search Tags:

prediction of protein binding sites, long short term memory networks, features fusion, integrated learning strategies

PDF Full Text Request

Related items

1	Prediction Method Of Gene Methylation Sites Based On LSTM With Compound Coding Characteristics
2	Research On The Prediction Model Of Protein-RNA Binding Sites Based On The Two-dimensional Fusion Of Graph Convolution Neural Network
3	The Research On The Prediction Method Of Protein Succinvlation Sites Based On PU Learning And Deep Learning Technology
4	Prediction Of DNA And RNA Binding Proteins Based On Machine Learning
5	Prediction Of Protein-DNA Binding Site Based On CNN-LSTM
6	Research On Meteorological Prediction Based On Long Short-term Memory Network
7	Research And Application Of Landslide Susceptibility Prediction Based On Long Short-term Memory Deep Neural Network
8	Identifying Splicing Sites Of Circular RNA Based On Deep Learning
9	Research On Protein Domain Boundary Prediction Based On Deep Learning
10	Research On Flash Flood Forecasting Based On Long Short-Term Memory Networks