Font Size: a A A

Research Of Protein Network Based On The Multiple Biological Information

Posted on:2014-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:X W TangFull Text:PDF
GTID:1260330401979289Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Our research work involves the construction of time course protein network, the identification of essential proteins as well as the prediction of protein complexes via the integration of multiple biological information.Cellular systems are highly dynamic and responsive to cues from the environment. Cellular function and response patterns to external stimuli are regulated by biological networks. A protein-protein interaction (PPI) network with static connectivity is dynamic in the sense that the nodes implement so-called functional activities that evolve in time. The shift from static to dynamic network analysis is essential for further understanding of molecular systems.The dissertation constructs the Time Course Protein Interaction Networks (TC-PINs) by incorporating time series gene expression into PPI networks. To decide if the dynamic networks work properly, the following steps are carried out. First, a clustering algorithm is used to create functional modules from three kinds of networks respectively:the TC-PINs, a static PPI network and a pseudorandom network. For the functional modules from the TC-PINs, both duplicate modules and nested modules are removed. Last, matching and GO enrichment analysis measures are performed on the functional modules detected from those networks to evaluate them. The comparative analysis shows that the functional modules from the TC-PINs have much more significant biological meaning than those from static PPI networks.Essential proteins are vital for an organism’s viability under a variety of conditions. There are many experimental and computational methods developed to identify essential proteins. Computational prediction of essential proteins based on the global protein-protein interaction (PPI) network is severely restricted because of the insufficiency of the PPI data, but fortunately the gene expression profiles help to make up the deficiency.In the dissertation, a novel essential protein predicting method WDC is proposed. Pearson correlation coefficient (PCC) is used to bridge the gap between PPI and gene expression data. Based on PCC and Edge Clustering Coefficient (ECC), a new centrality measure, i.e., the weighted degree centrality (WDC), is developed to achieve the reliable prediction of essential proteins. WDC is employed to identify essential proteins in the yeast PPI network to estimate its performance. For comparison, other prediction technologies are also performed to identify essential proteins. Some evaluation methods are used to analyze the results from various prediction approaches. The prediction results and comparative analysis are shown in the dissertation. Furthermore, the parameter λ in the method WDC will be analyzed in detail and an optimal λ value will be found. Based on the optimal λ value, the differentiation of WDC and another prediction method PeC is discussed. The results prove that WDC outperforms other state-of-the-art ones. At the same time, the analysis also means that it is an effective way to predict essential proteins by means of integrating different data sources.Protein complexes are a cornerstone of many biological processes and together they form various types of molecular machinery that perform a vast array of biological functions. An increase in the amount of protein-protein interaction (PPI) data enables a number of computational methods for predicting protein complexes. At the moment, most algorithms detecting complexes only consider the PPI data. However, the PPI data from high-throughout techniques is flooded with false interactions. In fact, the insufficiency of the PPI data significantly lowers the accuracy of these methods.The dissertation presents a novel method CMBI to discover protein complexes via the integration of gene expression profiles, essential protein information and PPI data. First, CMBI defines the functional similarity of each pair of interacting proteins based on the edge-clustering coefficient (ECC) from the PPI network and the Pearson correlation coefficient (PCC) from the gene expression data. Second, CMBI selects essential proteins as seeds to build the protein complex cores. During the growth process, the seeds’ essential protein neighbors and the neighbors whose functional similarity (FS) with the seeds are more than the threshold T will be added to the complex cores. After the complex cores are constructed, CMBI begins to generate protein complexes by attaching their direct neighbors with FS>T to the cores. In addition to the essential proteins, CMBI also uses other proteins as seeds to expand protein complexes. To check the performance of CMBI, the complexes discovered by CMBI are compared with the ones found by other techniques by matching the predicted complexes against the reference complexes. Subsequently, GO::TermFinder is used to analyze the complexes predicted by various methods. Finally, the effect of parameter T is investigated.The results from GO functional enrichment and matching analysis show that CMBI performs significantly better than the state-of-the-art methods. It means that it’s successful to integrate different biological information to predict protein complexes in the PPI network.The dissertation also proposed a novel genaral algorithm for predicting protein complexes from the high-confidence protein network.
Keywords/Search Tags:Protein-Protein Interactions, Gene Expression Profiles, FunctionalModules, Dynamic Protein Networks, Edge-Clustering Coefficient, PearsonCorrelation Coefficient, Essential Proteins, Protein Complexes
PDF Full Text Request
Related items