Font Size: a A A

Two Dimensions Of The Complexity Of Eukaryotic Genomes:Alternative Splicing And Pan-genome

Posted on:2017-02-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Q HuFull Text:PDF
GTID:1360330590990930Subject:Biology
Abstract/Summary:PDF Full Text Request
Increasing evidence suggested that genome size or the number of protein-coding genes couldn't reflect the complexity of a species.However,the total number of proteins that a species could generate might be a valid criterium.Alternative splicing enabled a genome to generate tremendous proteins with limited number of genes and might reflect the complexity of a genome.In this thesis,we proposed a new strategy to evaluate the extent of alternative splicing illustrated with the human genome.To begin with,we developed an ab initio predictor(ALTSCAN)enabling prediction of alternatively spliced transcript.ALTSCAN predicted as many as possible protein-coding transcripts based on genomic sequences only.Next we applied ALTSCAN to the human genome and filtered its predictions with RNA-seq data from diverse tissues and cell lines.As a result,we detected about 30,000 novel transcripts with coding potential with an accuracy of 84.1%,estimated by parallel real-time PCR verification.Besides,36 novel proteins were detected from shotgun protomics data from breast tissue.Moreover,by comparing with transcripts in public databases,we estimated the total number of human transcripts with coding potential to be at least 200,000.Another dimension of the genome complexity is the intra-species variations.An individual genome cannot reflect the total proteins that a species can generated,therefore intra-species variations should be also taken into consideration.These variations are mainly manifested in the single nucleotide variations(SNVs),structural variations(SVs)and gene presence-absence variations(gene PAVs),which are the focus of this thesis.Gene PAVs refers to those genes that are present in some but not all of the individuals of a species.Gene PAVs can be inferred by pan-genome studies,which have been widely carried out in bacteria but are still at the beginning stage in eukaryotes.Due to the large size of eukaryotic genomes,the numbers of individuals involved in pan-genome studies were restricted to no more than 7.This hindered the detection of gene PAVs.In this thesis,we first described a novel strategy(EUPAN)to enable eukaryotic pan-genome study at relatively low sequencing depth.EUPAN calculated gene PAVs via mapping reads of each individual to the pan-genome sequences.The sequencing data of individual genomes of human are limited currently,therefore we used sequencing data from the 3,000 rice genome project to explore gene PAVs within Asian cultivated rice(O.sativa).First,we detected 12,465 novel genes that are absent in the reference genome(IRGSP-1.0).Next we found at least 37.7% of the gene families in O.sativa showed PAVs and those gene families took up more than 20% of an individual genome.Moreover,we reconstructed the phylogenetic relationship based on gene PAVs.At last,we demonstrated phenotype difference could be explained by gene PAVs.Our results revealed that gene PAVs are widespread in Asian cultivated rice,suggesting gene PAVs are underestimated in current studies of eukaryotic genomes and pan-genome is an important dimension of the genome complexity.In summary,in this thesis,we discussed two important dimensions of the complexity of eukaryotic genomes,including alternative splicing and the pan-genome.Our study provides novel insights into the genome complexity.
Keywords/Search Tags:Alternative splicing, transcript prediction, gene presence-absence variation, pan-genome, Asian cultivated rice
PDF Full Text Request
Related items