Font Size: a A A

Protein Contact Map Prediction Using Deep Convolutional Neural Network

Posted on:2020-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:J Y XuFull Text:PDF
GTID:2370330620459974Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Protein is an important kind of macromolecule and it plays a significant role in catalysis and transportation.The study on protein function has a tremendous impact on the medical and pharmaceutical industries.Studies show that the function of a protein is largely determined by its structure and the research about protein structure has been in focus for a long time.Because of well-developed sequencing tools,protein sequence information continues to increase rapidly.However,protein structure information grows slowly because it is time-consuming and laborious to resolve protein structures by biological experiments.Therefore,how to predict the folded structure of a protein from its sequence has become an important issue.With the development of computer science,an interdisciplinary subject called bioinformatics is gradually emerging in the resolution of such problems and predicting protein structures based on computer science has become a hot research area.Predicting the protein contact map is a key problem in predicting the protein structure.A good contact map prediction can greatly improve the accuracy of structure prediction.This paper studies the prediction of protein contact map based on deep convolutional neural networks.Datasets are built for training and testing.A predictor is constructed based on a convolutional neural network to improve the efficiency of protein contact map prediction.Data imbalance problem is solved by weighted cross entropy loss function.The importance of the receptive field of a network is discussed and dilated convolutions are employed to improve protein contact map prediction.At last,an iterative framework is proposed to further improve performance.Specifically,the research in this paper mainly includes the following contents:1.Because of the lack of public data in the field of protein contact map prediction,a data set containing 9995 proteins was established based on the international protein structure database PDB.Each of the protein data contains seven different features,including structural features and physicochemical characteristics and so on.In addition,besides two general test sets,this paper also builds up an independent test set containing 207 proteins for testing algorithm performance.2.Because of the inefficiency of most traditional protein contact map prediction algorithm who make predictions for pairs of residues one after another,this paper proposes a protein contact map prediction algorithm based on convolutional neural networks.The algorithm has simple process and high prediction efficiency.The predictor is built based on the residual network,a variety of considerations are introduced in the structural design to adapt to the requirements of the protein contact map prediction problem,including input size uncertainty and large memory requirement.The implemented end-to-end algorithm can predict all residue pairs of a protein simultaneously.3.In order to solve the positive and negative sample imbalance problem in protein contact map prediction,weighted cross entropy loss function is introduced.Since the algorithm simultaneously predicts all pairs of residues within a protein,the weights of the loss function is assigned by a pre-computed weight matrix in the form of a mask.In the end,the algorithm can achieve the effect of not focusing on negative samples and large proteins.4.Because of the problem that the model does not predict large protein contact maps well due to small receptive field,an algorithm with large receptive field based on dilated convolutions is proposed to improve the performance of protein contact map prediction.In algorithms based on convolutional neural networks,the receptive field of the network determines the amount of peripheral information considered when predicting a pair of residues.Therefore,a large receptive field plays an important role in the prediction of protein contact map.The algorithm with dilated convolutions obtains a larger receptive field maintaining the size of the feature map and without increasing the complexity of the model.5.Based on the above prediction model,this paper implements a prediction framework based on an iterative framework and ensemble learning,which improves the prediction performance.Adding a new two-dimensional prediction matrix to the input features can improve the prediction accuracy of a single model.Inspired by this,reusing the prediction matrix of its own single model can improve the final prediction accuracy in an iterative form.Finally,this paper compares two ways of ensemble learning and uses averaging to improve the overall framework predictions.
Keywords/Search Tags:protein contact map, deep convolutional neural networks, data imbalance, dilated convolution, iterative framework, ensemble learning
PDF Full Text Request
Related items