Font Size: a A A

Construction Of Pangenome And Evolutionary Analysis Of Genome Structural Variations In Sheep

Posted on:2024-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:M GongFull Text:PDF
GTID:1523307298460314Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
Genomic structural variations(SV)are large genomic alterations.They are expected to induce larger-scale genomic perturbations in genes and regulatory regions than single nucleotide polymorphisms(SNPs)with impact on gene expression and phenotype.Comprehensive characterization of SVs has been hindered by technical limitations in accurate detection and genotyping,which are now being eliminated by recent advances in sequencing technologies and computational algorithms.High-fidelity(Hi Fi)reads delivered by the latest Pac Bio circular consensus sequencing(CCS),can not only enable high-quality de novo assemblies but also empower SV detection with unprecedented precision and accuracy.Furthermore,the graph pangenome approach has largely addressed previous challenges in the discovery of diverse SV types including multiallelic complex variations,and can particularly improve SV genotyping accuracy of short-read data.Sheep is among the first domesticated livestock with economic and cultural importance,but the SV spectrum of sheep remains to be explored across populations.In this study,we take advantage of the Hi Fi sequencing technology to generate a diverse panel of 15reference-quality sheep genomes,construct the first ovine pangenome,and an extensive catalogue of sequence-resolved SVs.Through analysis of the SV spectrum in short sequence reads of 649 domestic and 41 wild sheep,we highlight SVs as so far largely unexplored genomic contributors to variations in phenotypes.In addition,we also explore the origin and evolution of SV in ruminant scale,and found that ruminants,represented by cattle,sheep and goats,have similar evolutionary characteristics of SVs.The main results are as follows:1.High-quality de novo assemblies of 15 individuals from different sheep breedsWe performed Hi Fi sequencing for 15 individuals of genetically diverse sheep breeds and generated one primary haplotype-collapsed assembly and a pair of partially phased assemblies for each individual.Most chromosomes of the 15 primary assemblies are longer than their counterparts in the reference genome.These primary assemblies combined filled82.1%of the gaps in the reference genome and contain much more centromeric sequences.14 of the primary assemblies have a comparable or better continuity,completeness and quality value(QV)than the reference genome.2.Construction of the ovine pangenome and SV spectrumThe 30 partially phased assemblies were integrated into a multiassembly graph to construct an ovine graph pangenome,using the sheep reference genome as the backbone.The graph recovered an extra 130.3 Mb sequences absent from the reference genome.The SV catalog with precise breakpoints from the 15 individuals is discovered by leveraging Hi Fi reads and a genome graph,which contains 149,158 insertions/deletions,and abundant complex variations including 6531 divergent alleles and 14,707 multiallelic variations.201SV hotspots were found covering 182 Mb of the autosomes with enrichment in repetitive sequences and are 4.1-fold overrepresented at the terminal 5 Mb region of each chromosome.Furthermore,we also found a 2.19-fold overrepresentation of SVs in segmental duplications(SDs)and 95.1%of them are entirely within SD regions,suggesting that SDs are a major driver of SVs in sheep.3.Inference and characterization of SVs representing a derived state in sheepWe inferred the derived state of SVs in sheep and identified much more derived insertions than derived deletions(94,422 vs.33,571).The derived insertions are dominated by recent active LINE-1 which occupied 78%of the total length.This indicates that the unbalanced count of insertions and deletions are due to evolutionary consequences affected by recent active LINE expansions in sheep.We further investigated how frequently the SVs were linked to nearby SNPs,and found nearly half of the SVs(40.6%)displayed low to moderate linkage disequilibrium(LD)with surrounding SNPs and most SVs(77.6%)cannot be tagged by SNP probes from the widely used ovine 50K SNP chip.4.Selection signatures and functional candidates of tail morphology of SVs in sheepWe identified 622 SVs putatively under selection during domestication and found the surrounding FST-SNP of most differentiated SVs(76.69%)were below the top 1%of the global FST-SNP distributions,indicating that these SVs would probably be missed by traditional SNP-based studies.We next identified 929 population-stratified SVs among 690individuals from sheep worldwide.The highest signals are associated with five genes(IRF2BP2,BMP2,RXFP2,HOXB13 and PDGFD)with roles in sheep morphology.A novel 168-bp insertion in the 5’UTR of HOXB13 was found at high frequency in long-tailed sheep.Further genome-wide association study(GWAS)and gene expression analyses suggest that this mutation is causative for the long-tail trait.Furthermore,we also identified two regions that have been reported as candidate selective sweep for the fat-tail trait,which contain nine highly differentiated SV located in PDGFD and between BMP2and HAO1.5.the origin and evolution of SVs in cattle,sheep and goatsWe constructed SV maps of cattle,sheep and goats based on Hi Fi sequencing data.It was found that the SVs of the three species have the similar distributions of SV length,indicating that ruminant SVs may have the same principle in evolution.Based on the high-quality genomes of 25 ruminants,the SV of cattle,sheep and goats was traced and found to have similar SV origin.The vast majority of SVs originates from closely related species with a recent LINE expansion,suggesting that the recent expansion of active LINE may be the main driving force for the evolution of Ruminant SV.In addition,the number of SVs in cattle and sheep is much higher than that in goats,possibly due to the higher genetic diversity in cattle and sheep,or the influence of introgression.Compared with cattle and goats,the number of insertions is much more than deletions in sheep,suggesting that the sheep pangenome keeps expanding.In conclusion,we reported a reference panel of high-quality genome assemblies for 15diverse sheep breeds and presented a comprehensive and high-confidence SV catalogue based on Hi Fi sequencing.Our study is a proof-of-principle of extending the ovine pangenome to larger cohorts in order to associate SVs with more traits.In addition,combining ruminant high-quality genomes and the SVs of cattle and goats,we further explored the origin characteristics of SV at the ruminant scale,providing new insights into the evolution of SV in ruminants.
Keywords/Search Tags:sheep, de novo assembly, pangenome, genomic structural variation, HOXB13
PDF Full Text Request
Related items