Font Size: a A A

Research On Protein Complex Recognition Based On Graph Network

Posted on:2022-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:X L LengFull Text:PDF
GTID:2480306758992079Subject:Biology
Abstract/Summary:PDF Full Text Request
Protein is an important part of various cells and tissues in organisms,and plays a key role in various life activities.When proteins play a role in biological organisms,most of the time it is based on the joint participation of multiple proteins.The functional modules composed of these co-participating proteins are also called protein complexes.The experimental method to identify protein complexes has the problem of high cost,so it is necessary to use computational methods to identify protein complexes.In the field of computational methods to identify protein complexes,many successful methods have been proposed to detect protein complexes in protein interaction networks.Each algorithm has its own advantages,but there is also room for optimization and improvement.In this paper,two recognition algorithms and strategies are proposed for dynamic protein network and static protein network respectively,and the effectiveness of the algorithm is proved by experiments.Protein-protein interaction network,as a standard graph structure,because of its nonEuclidean characteristics,using various traditional data analysis methods,often due to the large amount of computation and low efficiency.With the development of graph network theories,graph embedding algorithm,as an effective graph analysis algorithm,has been introduced into the identification of protein complexes.The graph embedding algorithm reduces the dimension of the original graph network,obtains a low-dimensional vector representation and retains the key information in it.Graph embedding algorithms mainly include four categories of methods: embedding methods based on matrix factorization,random walk,neural networks,and autoencoders.In this paper,we try to introduce a variety of graph embedding algorithms to fuse the representation capabilities of the corresponding biological information enhanced graphs.The main research contents of this paper are as follows:(1)Based on dynamic protein networks,a DVCA algorithm for the identification of core-attachment protein complexes with variational autoencoders is proposed.In the DVCA algorithm,the dynamic protein-protein interaction network is first generated from the static protein-protein network and gene expression data.Then,after data cleaning and screening of the dynamic protein-protein interaction network,a weighted protein-protein interaction network was generated from the dynamic protein-protein interaction network.Then,the weighted network and the attribute network generated based on gene ontology are input into the variational graph autoencoder,and the node matrix containing the node vector is output.Finally,the core-attachment structure of protein complexes are generated from the node matrix,and various parameters of the generated protein complexes are evaluated.(2)Based on the first algorithm for further optimization,the MEA algorithm is proposed to further improve the recognition efficiency and ability of protein complexes.First,a new and concise data cleaning and optimization algorithm is proposed to reduce the complexity of the algorithm.Secondly,based on the variational autoencoder,an accelerated attribute network embedding method is proposed,which is further combined with the biological attributes of the protein nodes in the graph to optimize the expressive ability of the network.Finally,a new core-attachment identification algorithm was incorporated to expand the identification range of protein complex core identification and improve the identification accuracy of attachment proteins.The two protein complex identification algorithms proposed in this paper have undergone extensive experiments on multiple protein-protein interaction network datasets.The experimental results show that the DVCA and MEA algorithms proposed in this paper have achieved excellent recognition results.Compared with some high-precision recognition algorithms,the DVCA algorithm and the MEA algorithm have higher recognition accuracy and better operating efficiency.At the same time,it has a certain reference value for other protein complex identification algorithms in terms of building dynamic networks,combining biological attributes,network data cleaning,network embedding,and core-attachment identification algorithms.It is also possible to consider extending the algorithm to other networks to solve network problems.
Keywords/Search Tags:Protein complexes, graph embeddings, core-attachment structure, graph autoencoders, dynamic protein networks
PDF Full Text Request
Related items