Font Size: a A A

Endogenization And Evolutionary Genomics Of Eukaryotic Non-retroviral Viruses

Posted on:2012-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Q LiuFull Text:PDF
GTID:1113330344452571Subject:Plant pathology
Abstract/Summary:PDF Full Text Request
Viruses are the most ancient, numerous and adaptable biological entities we now know. They represent a vast and diverse source of novel genes and thereby their evolution can also affect host evolution. The study of virus evolution provides an integrating framework for not only understanding the diversity of viruses and providing explanations for the emergence of new viral disease, but also advancing our knowledge of host-virus interactions and revealing the exchange of genetic information between viruses and hosts. The large-scale comparison of viral genome sequences may provides a valuable way to study their evolution and interactions. However, this way is often limited by data, such as the available viral sample sizes are often both small and biased. Moreover, since viruses lack the geological fossil record, the study of virus evolution is confined to the present. Retroviruses normally integrate into the genome of the host cell as an obligate step in their replication strategy, and occasionally these viruses may integrate into the germline genome of their host, and become inherited endogenous retroviruses over millions of years. The endogenous retroviral sequences effectively represent the 'molecular fossils' of ancient viral genomes, preserving information about ancient virus and host interactions, and hence constitute an invaluable resource for reconstructing the long-term history of virus and host evolution. For non-retroviral viruses, which do not normally integrate their genomes into host DNA, the formation of 'viral fossils' should be far less likely.On the one hand, base our study on the Sclerotinia sclerotiorum, a long-term studied plant pathogens fungus in our lab, from which we performed molecular cloning and sequencing of the complete genome sequences of novel mycoviruses. On the other hand, base on the increasing availability of bioinformatics database, we used bioinformatics methods and technologies to discover genomic sequences of new viruses from EST database by cloning in silico, and more importantly, to identify the non-retroviral endogenous virus sequences, the 'molecular fossils' of ancient viral genomes, from the eukaryotic genomic databases by data mining. Finally, we used the new viral genome sequences combining with the related known viral sequences in the database and sited squarely within the framework of comparative genomics, with the phylogenetic analysis of viral gene and genome sequences as the main analytical tool to reveal novel virus diversity, the widespread endogenization and host range of non-retroviral viruses, the contribution of virus to eukaryotic host evolution, and discuss the origin and evolution of relevant viruses, the potential integration mechanism of non-retroviral viruses, and the interaction and evolution of virus-host in the genome level. The main results of this study are listed as following,1. We cloned and sequenced a novel RNA virus, named Sclerotinia sclerotiorum RNA virus L (SsRV-L), from a debilitated strain Ep-1PN of S. sclerotiorum. The complete genomic sequence of the SsRV-L is 6,043 nucleotides in length, excluding the poly (A) tail. Sequence analysis revealed the presence of a single open reading frame (ORF) that encodes a protein containing conserved methyltransferase, helicase, and RNA dependent RNA polymerase domains, which has significant sequence similarity to the replicase of Hepatitis E virus, a virus infecting humans belonging to "alphavirus-like" supergroup of positive-strand RNA viruses. As far as we know, this is the first report of a positive-strand RNA mycovirus that is related to a human virus. Genome comparison and phylogenetic analysis of SsRV-L with representative members of "alphavirus-like" supergroup showed that it clustered with the rubi-like viruses and that it is related to the plant clostero-, beny-and tobamoviruses, to the insect tetraviruses, and to the vertebrate hepeviruses and rubiviruses. Moreover, the viral phylogeny is consistent with the host phylogeny, suggesting that the progenitor of these viruses was originated anciently possibly prior to the separation of host fungi, plants, and animals and subsequently co-evolved with their hosts over long evolutionary history. This finding has potentially far-reaching implications for the understanding the origin and evolution of the large evolutionary lineage of RNA viruses as well as the emergence of new viruses. In addition, we presented convincing evidence that SsRV-L could replicate independently with only a slight impact on growth and virulence of its host. These results represent a significant contribution to future studies on the basis of virus-mediated hypovirulence for this plant pathogenic fungus.2. We cloned and sequenced a novel monopartite dsRNA virus, named S. sclerotiorum dsRNA mycovirus L (SsMV-L), from a virulence strain Sunf-M of S. sclerotiorum. The complete genomic sequence of the SsMV-L is 9,124 nucleotides in length and no poly (A) tail. Sequence analysis revealed the presence of two large ORFs (ORF1 and ORF2) and the 5'-untranslated region (UTR) and 3'-UTR were 1088 and 54 bp in length, respectively. ORF1 of SsMV-L was predicted to encode a 1,034-aa protein containing partial sequence of conserved Sugar ISomerase domain but its function is unknown. ORF2 was predicted to encode a 1,337-aa protein containing conserved RdRp domain characteristic of RNA viruses, suggesting that it is function as viral replicase. Genome comparison and phylogenetic analysis of SsMV-L with related dsRNA viruses revealed that SsMV-L represents a species of a new taxon of monopartite dsRNA viruses and the current taxonomy of monopartite dsRNA virus cannot meet the needs. Hence, it should be considered to establish new virus families or new genera within the existent family Totiviridae to accommodate the different viral evolutionary lineages. In addition, the phylogeny also suggests that the ancestor of chrysoviruses whose genome encompasses four segments is likely to be originated from monopartite dsRNA viruses. Intriguingly, a 'phytoreo S7 domain' was found downstream from the RdRp domain in the putative replicase of SsMV-L. This domain consists of P7 proteins of phytoreoviruses known to be viral core proteins with nucleic acid binding activities. PSI-BLAST searches showed that the S7 domain has also been found in various RNA viruses, including chrysoviruses, endornaviruses as well as some unclassified monopartite dsRNA viruses. Domain organization and phylogenetic analysis suggested that the S7 domain sequences were most likely to be derived from those of ancestral phytoreoviruses and then be occurred multiple horizontal gene transfers (HGTs) among diverse RNA viruses. This finding provides convincing evidence that the recombination events have occurred between the virus families with very distant genetic relationships from different host taxa and reveals the macroevolutionary mechanism of dsRNA viruses.3. We cloned and sequenced a novel bipartite dsRNA virus, named S. sclerotiorum partitivirus S (SsPV-S), from the strain Sunf-M. The genome of SsPV-S encompasses two segments:S-1 and S-2 and each contain one ORF. S-1 is 1,856 bp in length and encodes an RdRp; S-2 is 1,783 bp in length and encodes a coat protein (CP). Comprehensive phylogenetic analysis of SsPV-S with all known partitiviruses led to the identification of four major clades. One clade can consist of a mixture of plant partitiviruses (genus Alphacryptovirus) and fungal partitiviruses (genus Partitivirus), suggesting that horizontal transfer of members of the family Partitiviridae between fungi and plants were most likely to occur. Meanwhile, it suggests that current classification of partitivirus does not reflect the true evolutionary relationships of viruses, and therefore the taxonomy of the family Partitiviridae will probably need to be reconsidered. Intriguingly, SsPV-S CP has the highest aa sequence similarity to IAA-leucine-resistant protein 2 (ILR2) of Arabidopsis, its similarity to CPs of other partitiviruses is considerably lower. This raises the interesting possibility that HGT may have occurred between partitiviruses and genome of an Arabidopsis ancestor.4. We identified large numbers of novel viral sequences similar to partiti-, toti-, chryso-or endornaviruses from NCBI EST database by cloning in silico. Among these, the number of partitivirus-like sequences is 106, almost doubled the known partitiviral species. The viral sequences obtained from this study not only represented previously unknown viruses, many but also were from the new host even new host taxa. For example, many partiti- or endornavirus-like sequences were from animals, while these viruses presently have not been reported to infect animals. More importantly, Comprehensive phylogenetic analysis of these new viral sequences with related known dsRNA viruses revealed the long-term history of virus-host evolution and interaction, namely, the progenitors of these viruses were originated anciently possibly prior to the separation of host supergroups and subsequently likely to co-evolve with their hosts over long evolutionary timescales concomitant of frequent viral host changes. This study demonstrates the potential of virus cloning in silico for discovering novel viruses directly from database, which can greatly increase our knowledge of viral diversity, host ranges as well as the interaction and evolution of virus-host.5. We constructed a systematic search for sequences related to known dsRNA viruses in the publicly available eukaryotic genome databases. The results show that the RdRp and CP genes of partitiviruses and totiviruses have been widely endogenized into a broad range of eukaryotic genomes. Altogether,22 partitivirus and 34 totivirus RdRp or CP-like sequences were identified from the nuclear genomes of more than 20 eukaryotic organisms, including plants, arthropods, fungi, nematodes, and protozoa. PCR amplification, sequencing and comparative analysis supports the conclusion that these viral homologs are real and occur in eukaryotic genomes. Sequence comparison and phylogenetic analysis further demonstrated that these endogenous viral sequences were derived from endogenization of partitiviruses and totiviruses. Given that many of endogenous viral sequences were found in eukaryotic species which previously is not known to be infected by partitiviruses or totiviruses, our findings extends the host range of these viruses. Though analysis of conservation and expression of endogenous viral genes, we found that some of these, such as the partitiviral CP-like genes in Arabidopsis and Chinese cabbage (Brassica rapa), and the partitiviral RdRp-like gene in fruit fly (Drosophila grimshawi), were not only conserved but also expressed. Particularly, the ILR2 gene, a homolog of partitivirus CP, has been demonstrated to function in regulating the synthesis of the auxin indole-3-acetic acid (IAA). Hence, our findings imply that horizontal transfer of double-stranded RNA viral genes is widespread among eukaryotes and may give rise to functionally important new genes, thus entailing that RNA viruses may play significant roles in the evolution of eukaryotes.6. We performed extensive sequence similarity searches for sequences related to known linear ssDNA viruses in the publicly available eukaryotic genome databases. The results show that parvoviruses and densoviruses have been widely endogenized into a broad range of eukaryotic genomes. Altogether,62 nonstructural protein (NS)-like and 77 CP-like sequences of parvoviruses were identified from the nuclear genomes of 37 eukaryotic organisms, including mammals, fish, birds and tunicates; 92 NS-like and 44 CP-like sequences of densoviruses were identified from the nuclear genomes of 9 eukaryotic organisms, including crustaceans, arachnids, insects and flatworms. PCR amplification, sequencing and comparative analysis supports the conclusion that these viral homologs are real and occur in eukaryotic genomes. It is worth to note that some animal lineages (such as fishes, tunicates and flatworms) are not known to be infected by parvoviruses. Many endogenous parvoviral sequences were found in their genomes, however, clearly suggesting that these species can also be infected by parvoviruses, at least past. Sequence comparison and phylogenetic analysis suggested that many of endogenous viral sequences were ancient and occurred at least millions years. Especially, the identification of orthologous endogenous parvoviral CP-like sequences in the genomes of humans and other mammals suggests that parvoviruses have coexisted with mammals at least 98 million years, which implies that these viruses are much older than previously thought. As far as we know, this is the oldest'viral fossil'known. In addition, we also reveal that some of the endogenous viral genes were expressed, suggesting that parvoviruses might act as an unforeseen source of genetic innovation in their hosts. In summary, our discovery provides fossil records of past viral invasions, thereby helps to shed light on the evolutionary history of viruses and hosts, and advance our knowledge of host-virus interactions.7. We performed comprehensive sequence similarity searches for sequences related to known circular ssDNA viruses in the publicly available eukaryotic genome databases. The results show that sequences related geminiviruses, nanoviruses and circoviruses have been widely occurred in a broad range of eukaryotic genomes. Altogether,31 replication initiation protein (Rep)-like and 1 CP-like sequences of geminiviruses were identified from the nuclear genomes of 12 eukaryotic organisms, including plants, fungi, and protozoans; 271 Rep-like and 2 CP-like sequences of nanoviruses and circoviruses were identified from the nuclear genomes of 23 eukaryotic organisms, including green algae, diatoms, invertebrates and vertebrates. PCR amplification, sequencing and comparative analysis supports the conclusion that these viral homologs are real and occur in eukaryotic genomes. Though comprehensive sequence comparison and phylogenetic analysis of endogenous circular ssDNA virus-like sequences with related known viruses and eukaryotic or bacterial rolling-circle replicating (RCR) plasmids, our studies not only revealed the diversity of circular ssDNA viruses and their widespread host range, but also reconstructed the long-term history of virus and host evolution and advanced our understanding of the evolution of geminiviruses, nanoviruses and circoviruses. Furthermore, we also demonstrated that some of the endogenous viral genes were conserved and expressed, suggesting that these genes are also functional in the host genomes. We also identified a geminivirus-like and parvovirus-like transposable element in genomes of fungi and lower animals, respectively, and thereby provide direct evidence that eukaryotic transposons could derive from relevant viruses. It revealed that capture and functional assimilation of exogenous viral genes may represent an important force in eukaryotic evolution.
Keywords/Search Tags:Sclerotinia sclerotiorum, mycovirus(fungal virus), integration, horizontal gene transfer, endogenous virus, phylogenetic analysis, virus evolution, RNA virus, DNA virus
PDF Full Text Request
Related items