Research On Protein Function Prediction Method Based On Network

Posted on:2022-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:K Tan

Full Text:PDF

GTID:2480306536954809

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The popularity of high-throughput experimental methods has produced a large number of large-scale molecular and functional interaction networks.Combining these heterogeneous networks can make more accurate function predictions.Recently,network-based protein function prediction methods have made significant progress.These methods are based on network embedding methods to capture the nonlinear,low-dimensional feature representation of multiple heterogeneous networks to predict protein function.However,most existing methods only extract low-dimensional features from multiple networks without considering the correlation between each network feature vector and the label.In addition,with the generation of a large number of available sequences and other protein attribute information(such as protein domain information,subcellular location information),a large amount of protein attribute information and networks can be used for function prediction.However,traditional network methods can only be analyzed on non-attributed networks,and cannot be extended to attribute networks.Based on the limitations of existing methods and the development of biological data,it is very meaningful to design effective prediction methods to predict protein functions.This thesis mainly proposes two protein function prediction methods for research:(1)A multi-canonical correlated autoencoder(MCAE)model based on deep neural network(DNN)is proposed.It uses canonical correlation autoencoder(C2AE)to learn a deep latent space to realize joint feature and label embedding,and applies the generated latent space for protein function prediction.The latent space of multiple network features and labels generated by the MCAE method comprehensively considers the correlation of each network feature and function labels.In addition,the MCAE method trains multi-network integration and function classification together,instead of first obtaining integrated features through multi-network fusion as most methods,and then training a classifier to predict protein function.The overall training of multiple network integration and function classification can make better use of the functions of multiple networks.The MCAE model is implemented through a DNN architecture integrated with an autoencoder,which allows end-to-end learning and prediction.We test the MCAE model on human and yeast datasets and compare it with the advanced methods.The results show that the MCAE method can achieve better prediction results by introducing a multi-label embedding framework in protein function prediction.(2)A deep learning model GAE-GO is proposed based on graph neural network to predict protein function.GAE-GO model can analyze the attribute network and use variational graph autoencoder(VGAE)to obtain the embedding of nodes in the attribute network.The GAE-GO model consists of two parts:(1)an unsupervised representation model based on attribute graphs.This part uses both network information(including PPI network,SSN sequence similarity network)and node attributes(including protein sequence,subcellular location and protein domain)to generate a unique embedding representation for each protein.(2)a fully connected deep neural network(DNN)classifier to predict protein function.Compared with the traditional network calculation methods that only consider the analysis in the non-attributed network,the GAE-GO model extends it to the attribute network for function prediction,which can simultaneously consider the network structure and protein properties for function prediction.

Keywords/Search Tags:

function prediction, network integration, network embedding, multi-label classification, random walk

PDF Full Text Request

Related items

1	Study On Protein Function Prediction Based On Random Walk
2	Research And Application On Multi-relation Network Embedding
3	Algorithm Design And Implementation Predict Protein Function Based On The Random Walk
4	Research On Network Representation Learning Based On Multi-Granular Structure
5	Research On Heterogeneous Network Embedding Based On Random Walk
6	Prediction Of Credit Risk Contagion Of Listed Companies Based On Holding Network
7	Research On Key Genes Identification Methods Based On Multilayer Network
8	Research On Multi-label Classification Based On Decision Function
9	Prediction Of The Relationship Between CircRNAs And Complex Diseases Based On Heterogeneous Networks And Multi-data Fusion
10	Research Of Link Prediction Algorithm Based On Network Structure And Random Walk Theory