Font Size: a A A

Protein Function Prediction Based On Protein-Protein Interaction Networks

Posted on:2014-05-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:W XiongFull Text:PDF
GTID:1220330434971352Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Proteins are essential macromolecules of life and participate in virtually every process within cells, studies on protein functions will facilitate the understanding of life activities, clinical therapeutics and pharmaceutical design etc. However, experimental determination of protein function is not only expensive but also time-consuming, and hence can no longer catch up with the pace of the fast development of the contemporary life sciences. The recent advent of high-throughput experimental biology has generated vast amounts of protein-protein interac-tion (PPI) data. Thus, a number of prediction approaches based on PPI networks have been proposed, protein function prediction based on PPI network has become one of the most impor-tant research topics in Bioinformatics. This thesis focuses on protein function prediction based on PPI network, major contributions are as follows:1. A novel collective classification based approach is proposed to predict protein function, which combines protein sequence information and PPI information to improve the predic-tion performance. We first reconstruct a PPI network by adding a number of computed edges based on protein sequence similarity, and then apply a collective classification al-gorithm to predict protein function based on the new PPI network. Experimental results demonstrate that our approach outperforms most of existing PPI network based approaches, and adding implicit edges can indeed improve the prediction performance. Experimental results also validate our approach is robust to the number of labeled proteins in PPI net-works.2. Network reconstruction and edge enrichment are two major types of approaches that con-struct more reliable PPI networks. However, there lacks a systematic performance com-parison between these two types of approaches. Therefore, we conduct a comprehensive performance comparison study with two functional annotation datasets. We first recon-struct and enrich PPI networks by using protein sequence similarity, local similarity in-dices and global similarity indices, and then compare the prediction performance of these reconstructed and enriched networks with that of the original networks. The experimental results demonstrate that the enriched networks achieve more accurate predictions than the original networks and the reconstructed networks in most case. Experimental results also validate that sequence similarity is more effective than global similarity and local similarity in network enrichment.3. An active learning based approach is proposed to predict protein function, which achieves a better prediction performance by choosing the most informative proteins for labeling. We first cluster a PPI network by using the spectral clustering algorithm and select some ap-propriate candidates for labeling within each cluster by using three common graph-based centrality metrics (degree centrality, closeness centrality and betweenness centrality), and then apply a collective classification algorithm to predict protein function based on these annotated proteins. The experimental results demonstrate that the active learning based approach achieves a better prediction performance by choosing the most informative pro-teins for labeling. Experimental results also validate that betweenness centrality is more effective than degree centrality and closeness centrality in most cases.4. We explore the network characteristics of cancer proteins in human and yeast PPI networks. We first identify four sets of (?) non-cancer essential proteins and control proteins, and then map these proteins into the yeast PPI network by homologous genes. Finally we compare their network properties by using three common network topological measures (degree centrality, clustering coefficient and betweenness centrality). Experimental results demonstrate that cancer proteins tend to have higher degree and weaker clustering coefficient than non-cancer proteins. This means cancer proteins are central to the human PPI network and have strong connectivity, but the neighbors of the cancer proteins have less likelihood to connect each other.
Keywords/Search Tags:Protein function prediction, Protein-protein interaction network, Collectiveclassification
PDF Full Text Request
Related items