Font Size: a A A

Methods And Application Of Selective Pressure Estimation And Evolutionary Dynamics Study Of Mammalian Minimal Introns

Posted on:2012-11-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:D P WangFull Text:PDF
GTID:1220330467480023Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
As the sequencing technology is developing rapidly, mammalian genome sequence data are being acquired in large quantities and at enormous speeds. We now have a tremendous opportunity to develop more efficient and accurate methods to better understand which genes are the most variable or conserved, and what their particular functions and evolutionary dynamics are, through comparative genomics. The ratio of nonsynonymous substitution rate (Ka) to synonymous substitution rate (Ks) is widely used as an indicator of selective pressure at sequence level among different species, and diverse mutation models have been incorporated into several computing methods. We have previously developed a new y-MYN method by capturing a key dynamic evolution trait of DNA nucleotide sequences, in consideration of varying mutation rates across sites. In this paper, we now report a further improvement of NG, LWL, MLWL, LPB, MLPB, and YN methods based on an introduction of gamma distribution to illustrate the variation of raw mutation rate over sites and denote them as γ-NG, γ-LWL, γ-MLWL, γ-LPB, γ-MLPB, and γ-YN methods, respectively. Meanwhile, we investigate how variable substitution rates affect the methods that adopt different models as well as the interplay among four evolutional features with respect to Ka/Ks computations. Our results suggest that variable substitution rates over sites under negative selection exhibit an opposite effect on co estimates compared with those under positive selection. We believe that the sensitivity and reliability of our new methods has been improved than that of their original methods under diverse conditions. We also brought these gamma-series methods into the updated stand-alone toolkit KaKs_Calculator2.0. In addition, we adopted a sliding window technique to identify the positively selected regions in the expanding tool of this package.We chose thirteen vertebrate genome data (twelve mammals and one bird) to analyze human protein-coding genes and their orthologs by means of calculations of Ka and Ks. After evaluating eight commonly-used methods of Ka and Ks calculation, we found that these methods yielded more consistent results of Ka than those of Ks (or Ka/Ks). The possible explanations are:heterogeneity of real datasets; few nonsynonymous substitutions results in failure of complex models and multiple hits correction; effects of dissimilar evolution features may counteract each other; saturation of Ks in the distant evolutionary distance may lead to the unstable estimations. When sorting genes based on Ka, we noticed that fast-evolving and slow-evolving genes often belonged to different functional classes in a lineage specific manner. In particular, we identified two functional classes of genes in the acquired immune system. Fast-evolving genes coded for signal-transducing proteins, such as receptors, ligands, cytokines, and CDs, whereas the slow-evolving genes were for function-modulating proteins, such as kinases and adaptor proteins. Our results indicated that the functional specializations of the three major mammalian clades were:sensory perception and oncogenesis in primates, reproduction and hormone regulation in large mammals, and immunity and angiotensin in rodents. Our study suggests that Ka calculation can be used as a parameter to sort genes by evolution rate and can also provide a way to categorize common protein functions in defined lineages or subgroups.Intron is an important and considerable component of the eucaryotic genome and the debate related to exact function of introns and natural selections acting on introns are still going on these years. In this study, we carried out a comprehensive analysis on twelve mammalian genomes and found that there were bimodal distributions in the intron length of each species. Furthermore, we focused on the minimal introns (50nt-150nt) and found that nearly more than half of the minimal intron-containing genes (MIGs) have only one minimal intron. Minimal introns are able to locate in both5’-end and3’-end regions but there are more MIGs harboring near3’-end minimal introns. We observed the distinctions between the primitive mammals and other mammals in several evolution feature aspects of minimal introns, suggesting that the conserved mechanism of minimal introns may be specific to the lineage of mammals. Then, we analyzed re-sequenced179individual genomes from three major populations in the world and found out two major effects in minimal intron evolution. Size-effect:minimal introns with size of88nt-124nt tend to have a higher ratio of deletion to insertion than those with size of50nt-86nt; GC-effect:minimal introns with lower GC content (<65%) tend to have more deletions than those with higher GC content (>65%). The GC-effect results in a higher GC content in minimal introns than their flanking exons as opposed to larger introns (≥125nt) that always have a lower GC content than that of their flanking exons. We also observed that the two effects are distinguishable but not completely separable within and between populations. We validated the unique mutation dynamics of minimal introns in keeping their near-optimal size and GC content, and our observations suggest potentially important functions of human minimal introns in transcript processing and gene regulation.
Keywords/Search Tags:selective pressure, evolutionary rate, minimal introns, mammal
PDF Full Text Request
Related items