Font Size: a A A

Prediction Of Protein Contact Map Based On ResNet And DenseNet

Posted on:2021-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LinFull Text:PDF
GTID:2370330602982624Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the protein sequencing technology,more and more protein sequences are growing exponentially in protein databases.Since the method of determining protein structure through biological experiments cannot satisfy the current research requirements,it is an effective method to determine the protein structure by studying and predicting the protein contact map.In addition,the research and prediction of protein contact maps not only contribute to determining the three-dimensional structure of protein,but also can be used for the research of protein function,which has become an increasingly important tool for modeling the three-dimensional structure of the protein when no homologous structure is available.In this thesis,we propose a novel deep neural network framework for the protein contact prediction.The main contents are as follows:(1)In terms of data extraction,considering the important role of sequence information in bioinformatics,this thesis extracts the secondary structure and solvent accessibility of protein to expand features;in addition,the primary function of multiple sequence alignments in the protein contact map is to provide the feature containing sequence information.In this thesis,HHblits is used to generate the more accurate multiple sequence alignments,which is used to generate PSSM and PSFM features.Finally,it extracts pairwise features from the evolutionary coupling analysis method,and combines them with the aforementioned features.(2)This thesis proposes a deep neural network framework combining ResNet and DenseNet for the protein contact map prediction with the new feature set.This framework uses 1D ResNet to process sequential features generated by multiple sequence alignments,and besides PSSM,secondary structure,and solvent accessibility,it has introduced a new feature,position-specific frequency matrix(PSFM),as an input.Using ResNet's residual module and identity mapping it can effectively process sequential features after which the outer concatenation function is used for sequential and pairwise features.Prediction accuracy is improved following a final processing step using the dense connection of DenseNet.For the optimization of the model,this method uses a cross-entropy loss function to reduce the influence of discrete data,and applies the stochastic gradient descent algorithm to optimize the model's hyperparameter.(3)In order to verify the validity of our network framework,this thesis compares and analyzes the existing protein contact map predictors on five standard datasets.The prediction accuracy of the long-range contact on the PDB25 dataset in L/k(k=10,5,2,1)is 79.6%,73.5%,63.1%,47.8%;the prediction accuracy of long-range contact on other four public datasets(CAMEO,Mems400,CASP12,and CASP13)in top L is 42.0%,47.1%,40.3%,and 43.2%.The prediction accuracy shows that the proposed method is more effective than other popular methods for the generation of predicted protein contact maps.In addition,this thesis visualizes the protein contact map and makes a brief analysis.Finally,this thesis summarizes the research work on protein contact map prediction,and looks forward to the future work.
Keywords/Search Tags:Protein contact map prediction, Deep neural network, PSFM extraction, Multiple sequence alignments
PDF Full Text Request
Related items