Font Size: a A A

Research On Identification And Application Of Protein Complexes In Protein-Protein Interaction Networks

Posted on:2019-07-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:X X LiuFull Text:PDF
GTID:1360330572953465Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Proteins are molecules that contribute to virtually every activity in the body,and protein complexes are the main form of proteins performing their functions.Therefore,detecting protein complexes will help to deeply understand biological activity mechanism.With the increasing of protein interactions data,many computational methods have been proposed to identify protein complexes from protein interaction networks.In this dissertation,considering different demands of biologists,different protein complex detection algorithms are designed.The main contents of this dissertation include:This dissertation first analyzes the biological and topological characteristics of six protein interaction networks which are widely used for protein complexes detection.Then,it analyzes whether the characteristics of the protein interaction networks have impacts on the performances of the protein complex detection methods.When the protein interaction networks are given,a protein complex detection method based on protein node embeddings is proposed.The method first represents the nodes in the networks into a low dimensional vector space,and then uses the node embedding similarities to weight the edges in the networks.After that,it uses a seed extension method based on the topological properties of protein complexes to identify protein complexes from the weighted protein interaction networks.The experimental results show that using node embedding similarities could improve the reliability of the networks.Compared with the existing protein complex detection methods which only utilize the protein interaction networks,the proposed method achieves a higher F-score and identifies more complexes with a high biologically significance.When the protein interaction network and the corresponding standard protein complexes are given,a protein complex detection method based on protein complex embeddings is proposed.This method reconstructs the protein interaction networks based on the node embedding similarities,and a protein complex embedding representation method is proposed to describe the characteristics of known protein complexes.What's more,the method uses the supervised learning method to identify candidate protein complexes from the networks.Then,it utilizes the random forest method based on the potein complex embeddings to filter candidate protein complexes in order to output high quality preditected protein complexes.The experimental results show that using random forest method based on the protein complex embeddings to filter the candidate protein complexes could effectively improve the quality of the predicted complexes and thus identify more complexes with a high biologically significance.When protein interaction networks from multiple species are given,in order to overcome the false negative and false positive interactions in the networks and make full use of the protein interactions from different species,a protein complex detection method based on multi-source network embeddings is proposed.The experimental results show that the multi-source network embeddings generation method proposed in this dissertation not only could preserve the structural information of a single specie network,but also could make full use of the orthology information between proteins of different species.What's more,the experimental results show that the proposed method achieves a higher F-score than the existing protein complex detection methods,and it could identify more protein complexes with a high biological significance.In addition,this dissertation proposes a method which is based on protein complexes to identify orphan disease causative genes.This method combines protein interaction network's topological characteristics,GO annotations and protein complex features to predict orphan disease causative genes.The experimental results show that the use of protein complexes can improve the performance of orphan disease causative gene identification method in terms of precision and F-score.In summary,this dissertation proposes three protein complex detection methods under different demand conditions of biologists.The experimental results show that the proposed methods can effectively reduce the noises in protein interaction networks,and make full use of the topological properties of the protein interaction networks and the diversity characteristics of the protein complexes to help improve the performances of protein complex detection methods.What's more,the research can also be extended and applied to solve other tasks related to protein interaction networks.
Keywords/Search Tags:Protein Interaction Networks, Graph Embedding, Representation learning, Machine Learning, Orphan Disease Causative Genes
PDF Full Text Request
Related items