Font Size: a A A

Research On Protein Function Prediction Based On Deep Learning

Posted on:2023-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:W B ShiFull Text:PDF
GTID:2530307070483384Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the basic substance of life,proteins perform important functions in the body,such as transporting substances,regulating physiological activity and supplying energy.Understanding protein function is of great importance in areas such as disease analysis and the development of novel drugs.Traditional biological experimental methods to identify protein functions have disadvantages such as cumbersome operations and long experimental cycles,which make it difficult to meet the increasing demand for protein function annotation.In this paper,we aim to build a deep learning model to improve the accuracy of protein function prediction,and conduct in-depth research on both the cooccurrence of Gene Ontology(GO)terms and the fusion of sequence and structural feature extraction,with the aim of proposing a more reliable and practical computational method to help relevant biologists in their research.The main research work and contributions of this paper are as follows:(1)The existing computational methods only consider the extraction of features from protein data and ignore the co-occurrence of GO terms.A new deep learning model,named DeepPFP-CO,is proposed to predict protein function.DeepPFP-CO extracts network topological features from the constructed protein interaction network using the Deep Walk algorithm and combines key sequence features captured by CNN,Bi-LSTM and the InterProScan tool to make preliminary predictions.In addition,to enhance the preliminary prediction results by exploiting the co-occurrence of GO terms,DeepPFP-CO constructs a GO co-occurrence network based on the conditional probabilities among GO terms,and explores the co-occurrence of GO terms through a graph convolutional network to improve the prediction accuracy of protein functions.Experimental results show that DeepPFP-CO outperforms existing computational methods in prediction.To facilitate use by biologists,DeepPFP-CO provides an easy-to-use server platform.(2)In order to obtain key feature prediction functions from protein structures,this paper proposes DeepPFP-SS,a deep learning model based on the fusion of structural information and sequence features.DeepPFP-SS uses the structural data predicted by AlphaFold2 and constructs a network based on the spatial distance between amino acids,using graph attention networks to automatically learn the high-dimensional structural features of the abstracted image from the dynamic network.At the same time,DeepPFP-SS uses Word2 vec,iFeature and InterProScan tools to encode and extract features from protein sequences that are key to classification.Since DeepPFP-SS extracts features from both protein 3D structure and sequence,it contains and organically integrates knowledge from both structure and sequence domains,and is therefore able to accurately predict protein function.The experimental results show that DeepPFP-SS has the advantage of high prediction accuracy,providing a highly accurate and novel protein function prediction method for relevant biologists.
Keywords/Search Tags:Gene Ontology, Protein Sequence, Co-occurrence, Graph Convolutional Networks, Protein Structure, Graph Attention Networks
PDF Full Text Request
Related items