Font Size: a A A

High-throughput SNP Analysis In Oilseed (Brassica Napus L.) And Genome-Scale DNA Methylation Profiling In Brassica Rapa Based On Reduced Representation Sequencing

Posted on:2015-01-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:1263330428456759Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
The cultivated Brassica species include many important economic crops like Brassica rapa, Brassica oleracea and Brassica napus et al., which are one of the most closely related species to Arabidopsis thaliana. Most members of the Brassicaceae family are all polyploidy species, that diploid B. rapa and B. oleracea are considered as ancient triploid in which many genes contained three copies, and allotetraploid B. napus derived from naturally hybridization between B. rapa and B. oleracea. It is unable to high-throughput analyze SNP variations in B. napus without reference genome sequence. On the other hand, the presence of homoeologous sequences, would also hinder the Brassica genomics and epigenomics studies et al. Based on double-digestion reduced representation library and next generation sequencing technology, we sequenced an oilseed DH population and designed RFAPtools software to discriminate allelic SNPs from homoeologous sequences, and constructed two high-density genetic maps; Combined bisulfite-treatment technology, we developed modified RRBS technology to perform the genome-scale DNA methylation profiling in B. rapa.1. Construction of high-density genetic map in B. napus. Genetic maps have become essential tools for a wide range of genetic and genomics studies, which largely depend on polymorphic molecular markers. The presence of homoeologous sequences and absence of a reference genome sequence make discovery and genotyping of single nucleotide polymorphism (SNP) more challenging in allotetraploid B. napus. To address this challenge, we developed a reduced representation library construction technology, and designed a bioinformatics software called RFAPtools. RFAPtools consisted of three modules i.e.,1) assembly of a pseudo-reference sequence,2) SNP identification and3) discrimination of allelic SNPs from homoeologous sequence variations.Through in silico enzyme digestion, we analyzed the distribution of fragments across chromosomes, the length of fragments and suitable sequence data for each individual. RFAPtools would separated most homoeologous sequences, through the construction of pseudo-reference sequence. On the other hand, based of population sequence data, prf_allele.sh script would discriminate allele SNPs from homoeologous sequences. Hence this methodology is suitable for SNP analysis in all species, especially for species with complex genome structure without genome sequence. A common set of restriction fragments across a double haploid (DH) population (BnaNZDH) of highly established allotetraploid Brassica napus and its two parents were sequenced. Allelic SNPs and the presence/absence variations (PAVs) were identified using RFAPtools. Two parallel linkage maps, one SNP bin map containing8780SNP loci and one PAV linkage map containing12,423dominant loci, were constructed. By aligning these linkage maps to the B. rapa reference genome sequence, we assigned44unassembled sequence scaffolds comprising8.15Mb onto the B. rapa chromosomes, and also identified14instances of possible misassembly and eight instances of possible mis-ordered sequence scaffolds. To investigate the authenticity of identified SNPs, we randomly selected44SNPs, to directly sanger sequence and be transfer to CAPS markers to detected polymorphism between parents.26of all could be confirmed, and the PCR products of other18SNPs loci contained homoeologous sequences or did not result in target sequences. We also surveyed the91DH lines to validate the SNP genotypes using the26confirmed SNPs. A total of2251genotypes were generated with an accuracy of99.33%. Furthermore, we sequenced6DH lines in duplicate with different number of reads. The consistency of SNP genotypes between the two replications was higher than99.88%, and the consistency of PAV genotypes was sensitive to sequence data that higher than98%with more than1.50million reads.2. Genome-scale DNA methylation analysis in B. rapa. DNA methylation is one of the most important epigenetic modification, which would influence the gene transcription and transposon silencing. Recently epigenome of many important plant species were dissected using diverse high-throughput technology. Here we modified reduced representiation library methodology designed previously and developed modified RRBS technology, and applied it to dissect genome-scale DNA methylation in B. rapa. Through the comparism between sequences enriched by mRRBS and whole genome sequence, by calculating the percentage of three contexts (CG, CHG and CHH) distributed in gene and transposon region. Consistent results, which also from the in silico double digestion study in rice, confirmed that mRRBS could be used to dissect whole genome DNA methylation.Using mRRBS, we calculated whole-genome methylation levels at CG and non-CG sites, and observed overall genome-wide levels of52.4%CG,31.8%CHG and8.3%CHH methylation. Most CGs were either unmethylated or highly methylated, and51.8%CHG and77.4%CHH sites were hypomethylated. The chromosomal distribution of average methylation level of three contexts were studied and found that the distributions are consistent positive with repeats and negative with gene contents. Except lower DNA methylation distributed at pericentromeric region of A02chromosome, extensive DNA methylation detected around extant and ancient centromere regions. DNA methylation in gene and transposon regions were different, and the distributions in these regions were similar to Arabidopsis, that lowest around transcription start site and transcription termination region, and lower in gene-body compare to upstream or downstream regions. We also found stable extensive DNA methylation along transposon regions.We profiled the DNA methylation in gene regions belonging to three paleogenomes, resulted in LF<MF2<MF1without significant difference, and this result was consistent to gene expression level study. We also characterized the DNA methylation in different components of single-copy and duplicated genes, and found higher methylation in single-copy compared to duplicated genes especially around transcription start site and transcription termination region. Hence we considered that genes hypermethylated were prompt to be discarded and more hypomethylated genes were retained. Lower methylation level for single-copy in LF compared to other two subgenomes, but no consistent and significant difference for duplicated genes between three subgenomes were detected. We considered that differential DNA methylation between three subgenomes was due to differential DNA methylation in single-copy genes, and resulted lowest gene loss ratio in LF compared to another two subgenomes. Based on B. rapa epigenomics studies, we finally uncovered the possible molecular mechanism controlling gene loss and differential gene loss in three subgenomes in B. rapa.
Keywords/Search Tags:Brassica napus, Brassica rapa, DNA methylation, Gene loss, Genetic map, mRRBS, RFAPtools, SNP
PDF Full Text Request
Related items