Font Size: a A A

Studies On Identifying Protein Complex And Functional Module Algorithms Based On Multi-Source Biological Data

Posted on:2021-05-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X ZhangFull Text:PDF
GTID:1360330611967064Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein complexes are involved in most of biological processes in cells.A protein functional module is composed of proteins participating in a specific biological process.Integrally understanding protein functional module plays a key role on elucidating protein functionality.Computational analysis for protein complexes and functional modules is a critical approach to understand life activities in cell.With the development of computational methods and models as well as the innovation of integrating strategies,incorporation of proteomics data into model framework,identification of protein complexes,analysis of dynamical protein complexes,and detection of functional modules will contribute to the development of medical research and provide a wide prospect for disease diagnosis and target therapy.The early computational methods of identifying protein complexes and function modules mainly focus on the topological features of protein-protein interaction(PPI)networks.With increase of biological data,many researchers attempt to incorporate other biological information into PPI networks to identify protein complexes and functional modules more accurately.In this paper,based on PPI data and other multi-source biological data,and essential characteristics of protein complex and functional module,the properties of protein complex and functional module are analyzed,and the methods of identifying protein complexes and functional modules are investigated.The main research results of this paper are described as follows.According to the spatio-temporal constraints of forming protein complex in living organisms,the joint co-localization criterion and the joint co-expression criterion are established respectively.A functional homogeneity criterion of protein complex is proposed based on the biological relevance of protein complex.Furthermore,the feature that protein complexes correspond to the densely and reliably linked regions in PPI network is analyzed,a method for identifying protein complexes from static PPI networks,called ICJoint LE,is designed based on the core-attachment structure and seed-expanding strategy.The experimental results on protein-protein interaction networks STRING,Bio Grid,DIP,Uetz,Ito,and Yu indicate that compared to existing representative methods,the proposed method ICJoint LE performs better in terms of perfectly matched number #PM,the harmonic means of precision and recall f-measure,and comprehensive score #PM×FAM,and can identify more protein complexes of size 2-6.Owing to the development of cell cycle and cellular response to environmental changes,protein interactions vary over time.To describe dynamic changes of protein interactions objectively,the methods of constructing temporal dynamic protein interaction networks(TPNs)and generating temporal interval dynamic protein interaction networks(TI-PINs)are investigated.The constructed TI-PINs can not only eliminate the interfering interactions but also preserve permanent interactions during time intervals.Moreover,a novel method called ICJoint LE-DPN is devised to identify protein complexes from the constructed TI-PINs,which is capable of accurately identifying stable and permanent protein complexes and capturing transient protein complexes appeared at single time point.By integrating two yeast gene expression data sets and three yeast PPI data sets respectively,six TI-PINs are constructed.The proposed method ICJoint LE-DPN is exploited to identify protein complexes from six different TI-PINs respectively.The experimental results show that our method ICJoint LEDPN can accurately identify more protein complexes from the temporal interval dynamic protein interaction networks TI-PINs than from the static PPI network,and compared to the existing methods,on the whole,ICJoint LE-DPN can accurately identify more protein complexes and perform better in terms of perfectly matched number #PM,recall rec,the harmonic means of precision and recall fm,maximum matching ratio MMR,composition score FAM,and comprehensive score #PM×FAM.Most of the existing methods of identifying function modules emphasize on mining modular structure from the perspective of topology.Constructing dynamic PPI networks may split a functional module into several parts which appear in distinct PPI network snapshots.Thus,the existing methods of identifying functional modules fail to simultaneously consider the functionality and integrality of functional module in biological significance.Based on the aforementioned temporal dynamic protein interaction networks,the functionally-associated,significantly-co-expressed,reliably-linked,and temporal dynamic PPI networks FER-TPNs are constructed,the diagonal items with non-zero value are used to represent thecross-network edge to create temporal weighted adjacent matrices,and a novel method,called IFM-FER-TPNs,is proposed to identify functional modules from the constructed FER-TPNs.The method IFM-FER-TPNs adopts the high-in-module connectivity ratio-first strategy to find the functional modules with highly cohesive expression relevance,uses the criterion of judging the locally dense connectivity to identify the functional modules whose members are not densely connected as a whole,applies the seed-expanding strategy to search protein nodes with time stamp over multi-networks in order to cluster the proteins scattered in the multi-networks to generate functional modules.The cross-networks searching mechanism of IFM-FER-TPNs can eliminate the possibility that constructing dynamic protein interaction networks might fragment functional modules.The experimental results indicate that compared to the existing algorithms,on the whole,IFM-FER-TPNs can identify more functional modules accurately and totally perform better in terms of perfectly matched number #PM,recall rec,the harmonic means of precision and recall fm,maximum matching ratio MMR,composition score FAM,and comprehensive score #PM×FAM.The research results of this paper will enrich and promote the development of the algorithms of identifying protein complexes and functional modules.
Keywords/Search Tags:protein complexes, functional modules, identification algorithm, joint co-localization, joint co-expression, static PPI networks, temporal interval dynamic PPI networks
PDF Full Text Request
Related items