Font Size: a A A

Prediction And Annotation Of Ltr Retrotranspons In Plant And A New Method To Construct Phyloeeneic Trees

Posted on:2012-06-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:H J YuFull Text:PDF
GTID:1110330371469165Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Retrotransposons in plant are genetic elements that can amplify themselves in a genome through RNA intermediates. These elements are ubiquitous in al-most all eukaryotic organisms, and normally are a principal component of their nuclear DNA. In addition, LTR (Long Terminal Repeat) retrotransposons have been found to participate in gene regulation, evolution and other genetic func-tions. They are important to understand the function and evolution of Plant.With the developments of sequencing technology, genomic sequences avail-able online increase rapidly, and it provides both challenge and opportunity to identify LTR elements from these raw sequences depending on computational technology. Based on LTR_FINDER, an efficient tool to search LTR elements in genomic sequences, we build a system for whole genome LTR retrotransposons prediction, classification and annotation. Applying this system to grape(Vitis Vinifera), a new published eudicot plant, we discover2686full-length reliable elements and annotate at least32,000copies derived from LTR elements, con-stitute at least14.0%of the available genomic sequences. They are classified into168families based on sequence similarity. According to the order of coding domains, LTR elements in plant can be divided into two primary super fami-lies, namely, Gypsy and Copia. There are33families including810elements belong to Gypsy super family, and114families including1475elements belong to Copia super family. While the rest, lacking necessary domains, are categorized as TRIMs or LARDs by their length.Eukaryotic transposable elements are considered to have an ancient origin, and the major LTR element clades of Angiosperm exist before the species di-version. Phylogenetic analysis of reverse transcriptase(RT) and integrase(INT) domain reveal that all kind of common LTR clades of plant are present in grape genome. Gypsy in plants can be divided into two branches. One is Athila/Tat, and the rest belongs to Chromoviruses branch which contains a Chromatin Or-ganization Modifier Domain(Chromodomain) at the C-terminal end of their inte- grases. Athila/Tat are especially active with high copy numbers in grape genome, and some families within this branch are much longer than those in others. The copy number of Chromoviruses are relatively low, and less active. The classifi-cation of Copia super family is still a work in progress. In our result, Copia is splitted into nine major clades. Only a few families have high copy numbers, and20%families constitute about80%sequences derived from LTR elements. Molecular paleontology shows that the elements in grape are relatively young, and most of them transposed in the last two million years. EST(Expressed Se-quence Tag) evidence shows that about half family may be transposed in grape genome. LTR elements do not determine the function of an organism directly, and are not indispensable. However, Ka/Ks analysis reveals that they are under strong positive selection, and the pressure of Gypsy super family seems much stronger. Using BLASTN program, we search other angiosperms with whole genome sequences available, and find that many elements have orthologous se-quences. Especially in poplar genome, BLASTN search returns large number of hits longer than lkbp. We find at least six elements are super conserved between grape and poplar at DNA level, even more conserved than their coding genes. Nevertheless there are no large conserved regions in other eudicot species. Thus we infer these six elements may be transferred horizontally between these two species recently.Phylogenetic analysis is a key problem for understanding the evolution re-lationship of genes or organisms, and lot of works have been done in this field. When using CVTree, a method to construct the phylogenetic tree using whole genome sequence without sequence alignment, to construct the phylogenetic re-lationship of fungi, we find that the topology of close related taxa can be changed by adding taxa with large evolution distance. We design a program to construct phylogenetic tree, which we called Neighbor Clustering. It starts from distance matrix. Regard a single taxon or taxa connected which branches as networks, we connect the two networks with least distance first and renew the distance between networks. The procedure continues until all taxa are connected in one network, and it is our final unrooted tree. In our method, the topology of small networks only affect by close taxa, while the distant are unrelated. We first use computer simulation technique to evaluate the efficiency of our method. Given a rooted tree as a model tree, we can generate sequences of leaf note. Then we can construct a tree using these sequences, and compare with the model tree. Our result shows the method is efficient. Especially when large branches exist, our method seems better than Neighbor Joining(NJ). From the distance matrixes obtained from CVTree method, we construct two realistic trees, one for fungi and another for bacterial. From the same matrix, different methods give topologies with more or less difference, and compare our result with that given by NJ in detail.
Keywords/Search Tags:bioinformatics, plant genome evolution, LTR retrotransposon, VitisVinifera, horizontal transfer, phylogeny analysis, neighbor cluster
PDF Full Text Request
Related items