Font Size: a A A

The Construction Of Physical Map Based On BAC Library Profiling And Its Sequencing And Assembly Of Rapeseed(Brassica Napus L.)

Posted on:2019-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ChengFull Text:PDF
GTID:2393330545491096Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Rapeseed(Brassica napus L.)is the third largest oil crop after soybeans and palms.In our country the cultivated area and yield of rapeseed once were highest in the world,but compared with Europe and Canada,the production and oil content are lower.The low comparative benefit of rapeseed planting leads the decline of rapeseed planting area and heavily dependency of import in recent years.The enhancement of yield and oil content by genetic improvement of varieties is the fundamental way to boost the rapeseed industry in China.High-quality reference genome is of great significance for gene localization and cloning of important agronomic traits of rapeseed.At present,there are two reference genomes of Brassica napus named Darmor-bzh and Zhongshuang11 have been published.The two genomes are both constructed by whole genome shotgun method and next generation sequencing technology,and the genome coverage is about 80%.The commom disadvantages of these two reference genome are low coverage,unanchored scaffolds,misassembly and gaps which will bring confusion on the gene locating,cloning and structure analysis of chromosome.Therefore,it is necessary to build a high-quality reference genome of rapeseed by new generation sequencing technology and clone-by-clone method.In this study,we build BAC contigs of Zhongshuang11(ZS11)BAC library by whole genome profiling method and located the contigs on chromosomes to obtain the physical map.Sequencing BACs chosen on the minimum path of physical map.Meanwhile sequencing whole genome of ZS11 by Pac Bio Sequel platform to assist assembly of each BAC.The reasults are as follow:(1)Physical map construction: ZS11 BAC library consisting 73,728 clones was arrayed in 192 384-well plates with the 120 Kb average length of insertion.Each six 384-well plates as a unit were placed in a “2(row)×3(column)” format and were pooled into 48 row pools and 48 column pools by mixing all the BACs on the row or column,resulting in 96 pools per unit and 32 units and 3,072 pools in total.Plasmids of each pool were digested with Sac I/Mse I and ligated to barcoded adapters to NGS PE150 sequencing.A total 1.02 billion PE150 reads were obtained.After removing the contamination of E.coli(4.4%),Merging PE150 reads into 90bp×2 tags which were assigned to each BAC clones respectively by the barcode and the ranks of cross,resulting 63,454 clones with 16 tags each on average and 10,274 clones without tags.The number of tags in a BAC ranges from 0 to 220.Finally,with the FPC software and 10-15 FPC cutoff,a total 4,049 contigs were constructed with 10 BAC clones each on average.The number of BAC in each contig is in the range of 0 to 142.The number of unanchored clones is 21,123 which are called as singleton.Locating 2,934(72.46%)contigs on the chromosomes by 37,607 genetic markers based on the high-density genetic map constructed by the NAM group in the lab.(2)BAC selection and sequencing: 10,846 BACs were selected to sequencing from the minimum path of the physical map.Before large-scale sequencing,we evaluated the effect of different sequencing depth on BAC assembly,and found that the best sequencing depth was 500×.We build libraries respectively for each BAC clone and sequencing BAC with 500× depth and 400 bp insert-size and PE150.A total 266.74 Gb data was obtained,and 186.9Gb clean data after removing the plasmid vector and E.coli and PCR duplication.(3)Long-read sequencing of whole genome: sequencing whole genome of Zhongshuang11 on the Pac Bio platform with the 80× depth,resulting 97.07 Gb data in all.The N50 of subreads is 11,767 bp and average read length is 8,378 bp.(4)BAC assembly: By testing of k-mer and assembly software,we choose SOAPdenovo for NGS assembly of 10,846 chosen BACs with k-mer=95.We obtained contigs with 10 Kb average N50 after assembly.Then we extracted 1,800 subreads on average for each BAC by comparing NGS contigs and subreads with blasr software.The conditions of subreads extraction are:(a).The mapped length is higher than 90% of its own length for contigs which are less than 10Kb;(b).The mapped length is higher than 70% of its own length for contigs which are more than 10Kb;(c).The cumulative mapped length is greater than 50% of its own length for subreads.The average N50 of contigs was 120 Kb after subreads assembly,total 10,764 BACs were assembled and 8,901 BACs were assembled as one complete fragment and 1,665 BACs were assembled as 2-3 fragments.(5)Result evaluation: we randomly choose 6 assemblies of BACs by random function and put NGS data on the assembly by bowtie comparation.Testing the reads depth on the BAC assembly and coverage of NGS data on the BAC assembly,and finally we found the coverage of these 6 clones is full and the extreme depth was not found which proved the correctness of BAC clones assembly.The 10,764 BAC assemblies was compared with the published Zhongshuang11 reference genome,and the coverage of BACs on the genome was 67.56%,and the similarity of the two sequences was 99%,indicating the accuracy of BAC assemblies.
Keywords/Search Tags:Brassica napus, genome, sequencing, assembly, physical map
PDF Full Text Request
Related items