Font Size: a A A

Research On Metabolic Network-based Phylogenetic Reconstruction

Posted on:2010-06-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:T T ZhouFull Text:PDF
GTID:1100360305973644Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Phylogenetic reconstruction aims to reconstruct evolutionary relationships between species. It is an important biological research field and it has deep scientific and practical influences. In the current post-genomics era, the explosion in the quantity of generated sequence data makes phylogenetic reconstruction an obligatory step in the comprehension of life processes. Such phylogenetic reconstruction must be carried out at the network level and this leads to a strategy based on metabolic network comparison. Metabolic network-based phylogenetic reconstruction is not so easily influenced by the gene operations, which are unavoidable in the traditional molecular phylogenetic reconstruction. For this reason, metabolic network-based phylogenetic reconstruction increasingly attracts attention and is therefore the target of new developments.Currently metabolic network-based phylogenetic reconstruction strategies are carried out in three distinct steps: 1) the reconstruction of metabolic networks and their abstraction into sets of metabolites or enzymes; 2) the calculation of evolutionary distances using the set-theoretic distance functions; and finally, 3) the reconstruction of phylogenetic trees. However some problems persist in some of these key steps, notably in metabolic network reconstruction and evolutionary distance determination. For example, because of the various defects in the current approaches, reconstructed metabolic networks are usually not accurate enough. In addition, due to the lack of adequate software tools, the reconstructed networks published in various studies are usually inconsistent. Furthermore, since the set-theoretic network comparison methods neglect the difference of individual enzymes, the determined evolutionary distances lack accuracy. Our work on metabolic network-based phylogenetic reconstruction is designed to tackle the above problems using bioinformatical techniques. Our contributions are summarized as follows.First, we proposed a new approach to accurately and quickly reconstruct metabolic networks using data retrieved from the KEGG database.The correct reconstruction of metabolic networks is the foundation for the research on metabolic network-based phylogenetic reconstruction. The previous approaches for the reconstruction of metabolic networks using data from the KEGG database suffer from various flaws that lead to the introduction of erroneous data during the reconstruction process. In addition, these methods can not keep up with the rapid update of the KEGG database. Together, the drawbacks of existing methods severely reduce the usefulness and correctness of the reconstructed networks. To address these problems, we developed a new metabolic network reconstruction approach. This approach considers the organism-specific pathway data in the KEGG/PATHWAY database and the metabolic pathways? hierarchy data in the KEGG/Orthology database as the main data sources for network reconstruction. Our design makes use of the divide-and-conquer algorithm based on multi-branched recursion. Compared to the previous approaches, our developments offer several advantages. For example, by calling upon the KEGG web service we ensure that we use data that is both correct and up-to-date. The design and deployment of a local relational database allows a gain of time in respect to data retrieval and network reconstruction.Our second contribution is the development of two software applications that implement the proposed network reconstruction approach.Usually the first step of any research on metabolic networks is the reconstruction of such networks. Such a step is not easy to carry out without the help of user-friendly software. However, currently available software is mainly designed for network visualization or for data integration, which hardly meets the requirement of reconstructing metabolic networks in batch processing or for direct network computation. To palliate these deficiencies, we developed MetaGen and MetAtlas, two applications for metabolic network reconstruction. In both cases, we make use of advanced software engineering techniques and relational database designs. This paves the way for the research on metabolic networks, including the metabolic network-based phylogenetic reconstruction. MetaGen works in console mode and is a batch-processing network reconstruction tool, while MetAtlas is a plug-in for Cytoscape, one of the most popular software for biological network visualization and analysis. Because Cytoscape is a WYSIWYG - what you see is what you get– software, MetAtlas can not only reconstruct metabolic networks in an interactive fashion but also directly contributes to the computational network analysis by working in synergy with the other Cytoscape plug-ins. Both MetaGen and MetAtlas are platform-independent, database-independent and highly extendable. They are distributed as open-source under the GNU LGPL license. They are available at http://bnct.sourceforge.net/.As a third contribution, we proposed a new model to determine evolutionary distances, in which the difference of individual enzymes on the evolutionary conservation and the topological importance is considered for the first time.In previous studies on the metabolic network-based phylogenetic reconstruction, metabolic networks are usually regarded as node sets, and the evolutionary distances are usually determined using the set-theoretic distances. These approaches are not solid enough since they neglect the differences between the individual nodes. In this thesis we proposed a new model, WJD, for determining evolutionary distances by considering the evolutionary conservation and the topological importance of individual enzymes as weights. Using the 16S rRNA-based evolutionary distance as a reference, we compared the four distances derived from the WJD model with the plain Jaccard distance. We showed that the new distance, as measured by the WJD model, produces smaller errors in all comparison test cases. Finally, we made a first attempt to introduce the principle of Information Retrieval into the field of metabolic network-based phylogenetic reconstruction and we proposed a new reconstruction model.Although the WJD model considers the difference between individual enzymes when determining the evolutionary distance, it is still a set-theoretic approach where the structural differences between networks are not fully considered. To address this problem, we borrowed from the field of Information Retrieval research, the concept of Vector Space Model and applied it to the metabolic network-based phylogenetic reconstruction. We proposed a new model, which is called TopEVM. In this model, having obtained the co-occurrence patterns and the topological patterns of enzymes, we combine them together into a weight vector, which is used to represent the organism-specific metabolic network. By comparing these organism-specific vectors, we obtain the evolutionary distance matrix and thereafter reconstruct the phylogenetic tree. We compared the trees obtained by TopEVM with those determined by previous methods using the NCBI Taxonomy trees as a reference. This comparison proves that TopEVM can construct trees that are much closer to those of the NCBI Taxonomy than those obtained by previous methods.In summary, we concentrated our research on two key aspects of metabolic network-based phylogenetic reconstruction, i.e., metabolic network reconstruction and network-based evolutionary distance determination, both by using bioinformatics concepts. We proposed an new recursive approach to rapidly and reliably reconstruct metabolic networks using data from KEGG; we developed two applications to facilitate network reconstruction; we proposed a model for determining evolutionary distances using the characteristics of enzymes on the evolutionary conservation and topological importance; we introduced the principle of Information Retrieval into the research on the phylogenetic reconstruction and proposed a new model. All of these contributions should help to advance research on metabolic network-based phylogenetic reconstruction.
Keywords/Search Tags:phylogenetic reconstruction, metabolic network, bioinformatics, network reconstruction, evolutionary distance, set theory, information retrieval
PDF Full Text Request
Related items