Font Size: a A A

Research On Protein Complex Identification Algorithms Based On Protein-protein Interaction Networks

Posted on:2021-02-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:R Q WangFull Text:PDF
GTID:1360330623977398Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The completion of the human genome project is the prelude to the proteomics study.Proteins are not only the material basis of life activities,but also the direct embodiment of cell activities.In general,most of proteins seldomly participate in life activities alone,and they usually interact with each other and form protein complexes to perform different cellular functions.A protein complex is a group of proteins in simultaneous physically interact with each other at the same time and place to accomplish specific biological function or to participate in functional processes in bioinformatics.At present,protein complexes obtained by experimental methods are not only time-consuming and costly,but also the number of identifying protein complexes is limited.With the development of high-throughput laboratory techniques,a great number of protein-protein interaction networks(PPINs)have been generated.Therefore,it is possible that using graph clustering algorithms to automatically identify protein complexes from PPINs.Protein complexes are the carriers of realization of cellular functions and functional processes in living cells.Therefore,it is widely known that protein complexes are vital for studying the principles of cellular organization and functional mechanisms in systems biology,while they help interpret the mechanisms of diagnosis and the treatment of complex diseases.In this paper,the real PPINs are the basis of our research,and the topological structure of PPINs and various biological data are taken as a breakthrough,and the complex network theory and machine learning are used to study the identification of protein complexes.Considering the shortcomings of current protein complex identification algorithms and the challenges for protein complex identification,different protein complex identification algorithms are designed in this paper.The research works are as follows:1.We introduce relation work about protein complex detection algorithms,and the topological and biological characteristics of PPINs.Furthermore,we describe the problem of protein complex identification.They provide the basis for the design of protein complex identification algorithms.2.The existence of noise,and the accuracy and efficiency of existing identification algorithms need to be further improved,we present a protein complex algorithm based on an edge weight method and core-attachment structure(EWCA).First,we propose a weighting method to assess the reliability of interactions.Then,we identify protein complex cores using structural similarity.Furthermore,we introduce a novel algorithm to detect and distinguish overlapping proteins and peripheral proteins,and the attachment proteins are mined.Finally,protein complexes are formed by combining protein complex cores and their attachment proteins.Additionally,redundant protein complexes are discarded.The experimental results indicate that EWCA can effectively improve the performance of protein complex identification in terms of F-measure,MMR and CR.Meanwhile,EWCA can identify many more protein complexes with biological significance.Last,we apply it to two slightly larger PPINs and compared it with some high-accuracy identification methods,the results show that EWCA can have higher accuracy and better efficient.3.To mine protein complexes with various densities and modularities,and overlapping protein complexes,a novel algorithm based on seed-expanded methods and density and modularity with topological structure and GO annotations is proposed(SE-DMTG).First,the weight of the interacting edges is calculated using the arithmetic average of common neighbors and GO annotations.Then,we define a seed selection strategy combining weight degree and local weight aggregation coefficient,which is used to construct a seed queue.Finally,in order to further improve the accuracy of protein complex identification,a seed-expanded method and protein complex model are proposed to identify protein complexes.The experimental results show that SEDMTG achieves an ideal performance with respect to accuracy and matching ratio in yeast compared with that of thirteen classical algorithms.Moreover,the performance of SE-DMTG is further verified on many PPINs from several species and the results indicate that SE-DMTG has strong adaptability.In addition,the protein complexes identified by SE-DMTG have obviously biological significance.4.Considering that existing protein complex identification algorithms only identify static or dynamic protein complexes and existing protein complex models could not reflect the inherent structure of protein complexes,we present a novel graph clustering method for mining protein complexes based on clustering model from dynamic and static PPINs(MPC-C).First,MPC-C constructs a probabilistic dynamic PPIN by using gene expression data and constructs weighted dynamic and static PPINs combining topological structure and functional annotation.Second,initial clusters are obtained based on the identification of core proteins and multifunctional proteins in the weighted dynamic and static PPINs,following which we propose a greedy heuristic search algorithm and a novel clustering model to identify protein complexes.Finally,according to the clustering model,unreliable and highly overlapping protein complexes are discarded.To demonstrate the performance of MPC-C,we test it on five PPINs and compared it with ten effective methods.The experimental results indicate that MPC-C significantly outperforms the other state-of-the-art methods with respect to various computational and biologically relevant metrics.In summary,these proposed algorithms and other excellent comparison algorithms are tested on a lot of PPINs from different species.The experimental results fully demonstrate the excellent performance of these proposed algorithms in this paper.Furthermore,we also enumerate a large number of identified protein complexes and analyzes their functional enrichment,which indicates that they are likely to be real protein complexes.The protein complex identification algorithms proposed in this paper also provide reference for mining community structure in other types of complex network,and we will further to study it in the future.
Keywords/Search Tags:Protein-protein interaction networks, protein complexes, graph clustering methods, core-attachment structure, complex network, community structure, greedy heuristic search algorithms
PDF Full Text Request
Related items