Font Size: a A A

Codon Analysis And Its Application In Bioinformatics And Evolutionary Studies

Posted on:2007-06-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J ZhangFull Text:PDF
GTID:1100360212984352Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Codon analysis and its application in bioinformatics and evolutionary studies are important issues for investigating the genome evolution, protein function, and inter-action between genetics and environment. In this study, we take on explorations in following three ways: comparative genome analysis on codon usage pattern, phylogenetic tree construction based on codon usage and detection of adaptive evolution using codon substitution models.The main results in this thesis are described as follows:1. The vast majority of prokaryotic and eukaryotic species have a non-random codon usage. Alternative synonymous codons in most genes are used with unequal frequency, i.e., certain synonymous codons are significantly preferred over others. Synonymous codon usage analysis is useful to find some genomic features and to understand the molecular evolution of genomes. We analyze synonymous codon usage patterns and underlying driving forces among different genomes within one species and among different species genomes.a) In many organisms, the difference in codon usage patterns among genes reflects variation in local base compositional biases and the intensity of natural selection. In this study, a comparative analysis is performed to investigate the characteristics of codon bias and factors in shaping the codon usage patterns among mitochondrion, chloroplast and nuclear genes in common wheat (Triticum aestivum). GC contents in nuclear genes are higher than that in mitochondrion and chloroplast genes. The neutrality and correspondence analyses indicates that the codon usage in nuclear genes would be a result of relative strong mutational bias, while the codon usage patterns of mitochondrion and chloroplast genes are more conserved in GC content and influenced by translation level. The Parity Rule 2 (PR2) plot analysis showes that pyrimidines are used frequently than purines at the third codon position in the three genomes. In addition, using a new alterative strategy, 11, 12, and 24 triplets are defined as 'preferred' codons in the mitochondrion, chloroplast and nuclear genes, respectively. These findings suggest that the mitochondrion, chloroplast and nuclear genes shared particularly different features of codon usage and evolutionary constraints.b) We compare the difference in codon usage between monocot and dicot plant species based on mitochondrion and nuclear genes. Genes of dicot plants are more conserved in GC content. The neutrality and correspondence analyses indicates that the codon usage in nuclear genes is similar to that in mitochondrion. High GC contents may be present in monocot plants, especially in poaceae. The codon usage patterns in nuclear genes of monocot plants reflect the nucleotide composition bias influenced by the mutation constraints.c) The amino acids usage pattern discriminates the genes coding for seed storage protein (SSP) from other genes in 590 complete nuclear coding DNA sequences in Triticum aestivum (wheat). In this study, correspondence analysis on codon usage pattern distinguishes the gene members of SSP subunit families from each other in space of axes generated from the correspondence analysis, suggesting that the gene family members cluster together based on similar codon usage.2. According to the hypothesis that the codon usage patterns are similar in closed species, we advance a new way working on unaligned virus sequences to construct phylogenetic tree using codon usage combining other measures, such as length. Both the results of genomes and protein coding sequences are consistent to the expectation trees.3. Nonsynonymous-synonymous substitution rate ratio (d_N/d_s) is an important measure for evaluating selective pressure based on the protein-coding sequences. Maximum likelihood (ML) method with codon-substitution models is a powerful statistic tool for detecting amino acid sites under positive selection and adaptive evolution. We analyze the hepatitis C virus (HCV) envelope protein-coding sequences from 18 general geno/subtypes worldwide, and find 4 amino acid sites under positive selection. Since these sites are located in different immune epitopes, it is reasonable to anticipate that our study would have potential values in biomedicine. It also suggests that the ML method is an effective way to detect adaptive evolution in virus proteins with relatively high genetic diversity. Besides ML method, the parsimony method of Suzuki and Gojobori is one of widely used methods for detecting natural selection in homologous protein-coding sequences. We detect amino acid sites under positive selection in protein-coding sequences ofHCV 1a and 1b. There are 3 and 33 sites under natural selection and most of them are located in immune epitopes.4. The different influence factors in shaping codon bias by nucleotide composition bias and translation expression level are difficult to be distinguished. Here, we present a more proper alternative strategy to remove the effect of background noise of nucleotide composition, and well infer the preferred codons that directly correlated to translation selection. We adopt three distinct datasets from Escherichia coli, yeast and wheat for investigating the validity of our strategy. The comparative analysis between previous and new strategy suggests that our method is more universal and reliable as no influence from nucleotide composition bias.
Keywords/Search Tags:Codon Usage Pattern, Bioinformatics, Molecular Evolution, Synonymous Codon, Adaptive Evolution
PDF Full Text Request
Related items