Font Size: a A A

Annotation And Comparative Study Of LTR Retrotransposons In Plant Genomes

Posted on:2009-12-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:H WangFull Text:PDF
GTID:1100360272488943Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Transposable elements(TEs) are DNA fragments that can insert into new chromosome locations.They have been found in virtually all Eukaryotic genomes investigated so far.LTR retrotransposons(LTR elements) are classâ… TEs that transpose in a "copy and paste" mode via RNA intermediates.They are predominant components of large plant genomes and their dynamics has been well established as one of the major forces underlying the remarkable variation of plant genome size.In addition,LTR retrotransposons have been found to participate in gene regulation and other genetic functions. The study of these elements show great value to life science and biotechnology.Owning to the great development in sequencing technology,genomic sequences of many organisms are cumulating and this presents a great challenge of rapid identifying LTR retrotransposons in them.However,until today,tools for efficient annotation of LTR elements in large-scale sequences are very limited.The present work has built for the first time a system of whole genome LTR retrotransposon annotation.The annotation system integrates 3 functional units,i.e.ab initio algorithm,comparative genomics and homologous search,to predict reliable LTR elements in raw genomes.The first unit,LTR_FINDER,predicts full-length LTR elements in four main steps.The first step identifies all exactly matched sequence pairs that satisfy distance restrictions based on a linear time suffix-array algorithm,then it tries to merge neighboring exactly matched pairs into longer highly similar pairs.After that,the program finds element boundaries though aligning regions close to boundaries and checking "signal strings" of boundaries.The forth step validates elements by recognizing sequences encoding important TE proteins in their internal domains.At last,the program gathers information and reports possible LTR retrotransposons at different confidence levels.The second unit,LTR_INSERT,combines structural and evolutionary information to predict LTR elements with high reliability.It identifies transposition events by comparing alleles in two genomes.The program first constructs whole-genome alignment for two related genomes,then categorizes the alignment into indels of proper size and well-aligned blocks.Next it discovers LTR elements in indels,most of which represent transposable events posterior to the divergence of two genomes.At last the algorithm discovers elements in well-aligned regions by recognizing structural characters of elements. These elements mainly represent insertions prior to divergence.We have developed a third program(Unitâ…¢) to annotate LTR copies in genome based on results of LTR_FINDER or LTR_INSERT.This program adjusts boundaries of elements predicted by LTR_FINDER,identifies unrelated sequences inside elements, classifies elements into families and discovers all copies for each families.The three units can be used independently or combinatorially.LTR_FINDER plus Unitâ…¢is used when scanning single genome while LTR_INSERT plus Unitâ…¢is adopted when two closely related genomes are available.In summary,the three units provide 3 supports (structure characters,insertion signals and multiple copies) for each predicted element.By applying LTR_INSERT to two rice genomes,we identify 993 full-length LTR elements,annotate a total of 15916 copies related with them,and discover 80 novel LTR families.We observe two important events in the evolution of Asian cultivated rice through full-length elements:1) two subspecies have extensively interacted through inter-(sub)species nonreciprocal homologous recombination(ISNR) in as recent as 53,000 years,large-scale samplings in protein coding genes,intergenic regions and random sites show that this phenomenon is not restricted to retrotransposons and at least 15%of the genome has been experienced ISNR recently.2) LTR elements provide two independent evidences to confirm that two genomes diverged about 600,000 years ago.At whole-genome level,this work for the first time confirms that very recent ISNR is an important force that mold modern cultivated rice genome and estimates the incidence of such recombination.The investigation of amplification patterns of rice LTR families in time and spacial dimensions shows:1) LTR retrotransposons have been active since their divergence and the activity has been relative balance between two lineages.2)80%post-divergence insertions were driven by 20%highly active families.3) These predominant families had been active in the common ancestor and the divergence event did not significantly altered their activity.4) Distribution of LTR elements is non-random across rice genomes:they tend to cluster in centromeres and 5' ends of chromosomes.The main trend of LTR density is to decrease from centromeric neighborhoods to 5' and 3' ends of chromosomes.Combining LTR_FINDER and Unitâ…¢,we investigate LTR retrotransposons in medicago truncatula(Mt),a model plant of the Fabaceae family for the first time. We identify 526 full-length elements and annotate 17421 copies related with them. Elements are categorized into 85 families and 66 among them are reported for the first time.We analyze the organization of proteins and PBS usage of LTR families and their phylogenetic relationship.We find that the majority of LTR elements in Mt belong to either Copia or Gypsy superfamily,and that the number of Copia-like families is more than 3 times that of Gypsy-like families but the latter are more active.The analysis of amplification-deletion pattern of Mt LTR elements shows that detectable full-length LTR retrotransposons are relatively young and most of them inserted in as near as 0.52MY.We estimate that tens Mb of Mt DNA has been removed by deletion of LTR elements and the rate of removal was more rapid than in rice.For several important families we describe their structure,amount in genome and sequence conservation. At last,we investigate the activity of homologous elements of Mt LTR families in two other legumes,Lotus japonicus and Glicine max.We find that 1)comparing with elements from Gypsy superfamily,those from Copia superfamily have been more active in both genomes.2)Activity of LTR elements is highly lineage specific.3)It is highly possible that LTR retrotranposons have contributed much in shaping large legume genomes.
Keywords/Search Tags:bioinformatics, genome evolution, LTR retrotransposon, Oryza sativa, indica, japonica, indica-japonica differentiation, Medicago truncatula
PDF Full Text Request
Related items