Font Size: a A A

Research On Genic Function By Clustering On Protein Network

Posted on:2007-08-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:H C LuFull Text:PDF
GTID:1118360185454192Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In post-genome era, the objective of functional genomics study is to decode thefunctions of genes and control them. Since the proteins interacting with each other tend tohave similar cellular functions, we can, following the route of "interactions ->network ->functions", infer the potential functions of unknown proteins by comparing them withproteins of known functions from the latest protein-protein interaction network and otherhigh-throughput experimental data. Starting with the researching on the visualization ofnetwork, the clustering on multiple sources data with temporal and spatial information, andmodularized clustering, we plan on developing new algorithms to solve the problems onfunctional proteomics and compare them with other methods in multifarious ways. Finally,we will combine the methods with the software of visualizations of network and analyzethe proteins functions of budding yeast systemically.We provide a simple but information-rich approach for visualization, which integratestopological and biological information. In our method, the topological information likequasi-cliques or spoke-like modules of the network is extracted into a clustering tree,where biological information spanning from protein functional annotation to expressionprofile correlations can be annotated onto the representation of it. We have developed asoftware named PINC based on our approach. Compared with previous clusteringmethods, our clustering method ADJW performs well both in retaining a meaningful imageof the protein interaction network as well as in enriching the image with biologicalinformation, therefore is more suitable in visualization of the network.We presented a simple hierarchical clustering algorithm that goes a long way tointegrate high-throughput data into investigations of the systematic and dynamicorganization of biological networks. Our method effectively reveals the modular structureof the yeast protein-protein interaction network and distinguishes protein complexes fromfunctional modules by integrating high-throughput protein-protein interaction data with theadded subcellular localization and expression profile data. Furthermore, we take advantageof the detected modules to provide a reliably functional context for the uncharacterizedcomponents within modules. On the other hand, the integration of various protein-proteinassociation informations makes our method more robust to false-positives, especially forderived protein complexes.A new method called modularized clustering method(MCM), which are based on thedirect and second-order interactions of modules, is applied to the latest high-throughputprotein-protein network of yeast to predict the function of unknown proteins in themodules. P value of hypergeometric cumulative distribution of modules and thedisturbance analysis on the data, including adding, removing and rewiring interactions, areemployed to evaluate the prediction quality and robustness of the method. The results showthat MCM has high prediction precise rate and coverage, and it is robust to highfalse-positive data and missing data. The predicted results of unknown proteins with highprediction precise rate can be instructive in biological analysis and the algorithm can begeneralized to other networks with the similar structures.We designed software for visualizing the PPI network by clustering method. Thesoftware integrated several common clustering algorithms in PPI network analysis,employed new and traditional visualization method, and combined the topological andbiological information together. It offers a convenient analysis platform on proteininteraction network for different operation systems.In addition, spectral method derived from graph theory was introduced to uncoverhidden topological structures (i.e., quasi-cliques and quasi-bipartites) of complicatedprotein-protein interaction networks. Our analyses suggest that these hidden topologicalstructures consist with biologically relevant functional groups. This result motivates a newmethod to predict function of uncharacterized proteins based on the classification of knownproteins within topological structures. Using this spectral analysis method, 48 quasi-cliquesand 6 quasi-bipartites were isolated from a network involving 11,855 interactions among2,617 proteins in budding yeast, and 76 uncharacterized proteins were assigned functions.We propose a mathematical model to estimate the evolution rate of the SARScoronavirus genome and the time of the last common ancestor of the sequenced SARSstrains. Under some common assumptions and justifiable simplifications, a few simpleequations incorporating the evolution rate (K) and time of the last common ancestor of thestrains (T0) can be deduced. We then implemented the least square method to estimate Kand T0 from the dataset of sequences and corresponding times. Monte Carlo stimulationwas employed to discuss the results. Based on 6 strains with accurate dates of host death,we estimated a time of the last common ancestor, which is coincident with epidemicinvestigations, and an evolution rate in the same range as that reported for the HIV-1 virus.
Keywords/Search Tags:Functional Genomics, Systems biology, Protein Interaction Network, Clustering, Visualization of Network, Data Fusion, Protein Complex, Prediction of protein function.
PDF Full Text Request
Related items