| Cattle are one of the most important livestock species due to their production and role in human culture.Domestic cattle are mainly divided into taurine cattle(Bos taurus)and indicine cattle(Bos indicus),which originated from independent domestication events in the Near East and the Indus Valley respectively.These have spread around the world to form six commonly accepted cattle groups,including European taurine,African taurine,Asian taurine,Indian indicine,African indicine,and Chinese indicine.Among these,Chinese indicine cattle is thought to have spread into China between 3,500 and 2,500 years before present.Previous studies have shown that Chinese indicine has a complex genetic background,which is a valuable gene pool in cattle molecular breeding.However,these genomic data rich in genetic variation have not been systematically investigated.To this end,by taking advantage of the latest sequencing technologies and research methods,and analyzed them from five aspects: high-quality de novo assembly of Chinese indicine,constructing a Chinese indicine multi-assembly graph,generating and characterizing a catalog of structural variants(SVs)in Chinese indicine,introgression history inference,and adaptive introgression of structural variation.In this study,we generated 18~24 fold coverage accurate circular consensus sequencing(Pac Bio Hi Fi)and Illumina sequencing for 10 geographically distant female Chinese indicine cattle.We also collected fresh blood samples from 19 Piedmontese and Nanyang cattle(crossbreed offspring of Piedmontese and Nanyang cattle),and performed whole-genome resequencing and transcriptome sequencing.In addition,we downloaded the resequencing data for 394 cattle from public databases,covering seven wild relatives of domestic cattle and all six domestic cattle breed groups.Through the comprehensive and systematic analysis of the above data,the research results as follows:1.By de novo assembly of 10 female Chinese indicine genomes,we obtained 30 haplotype genomes at the chromosome level,of which the primary genome size is2,679-2,714 Mb,the Contig N50 is 18-91 Mb,and the genome completeness estimated by BUSCO is about 95.5%,compared with the reference genome ARS-UCD1.2 contains longer centromere and telomere sequences,fewer gap numbers,and some chromosomes have achieved centromere to telomere assembly.For partially phased assemblies,the genome size is 2,585-2,698 Mb,the Contig N50 is 1.24-13.45 Mb at 89.3-93.8% BUSCO completeness.2.Based on the ARS-UCD1.2 backbone of the reference genome,we constructed a multi-assembly graph of the Chinese indicine,which contains 148.5 Mb of non-reference genome sequences,and extracted 74,907 non-reference alleles from the graph,whereas26.21% were identified in wild Bos species.In this graph,we found 381 bubble breakpoints overlapping with the reference genome coding sequences,most of which are highly polymorphic VNTR domains.In addition,de novo predicted 1,153 complete gene models from the non-reference sequences,and a total of 456 genes were found of which 271 are novel by homology-based approach.These predicted protein-coding genes are mostly represented in multigene families and were enriched in olfactory transduction,immune response,signal transduction and other pathways.3.In order to obtain a reliable set of SVs in the Chinese indicine population,we used four long-reads mapping-based approaches to find SVs.We obtained 156,009non-redundant high-confidence SVs containing 73,899 deletions and 82,120 insertions.These SVs are non-randomly distributed across the genome,we identified 206 SV hotspots spanning a 195 Mb genomic region,of which 119 SV hotspots were novel,and all SV hotspots were significantly enriched in protein-coding genes above,and these protein-coding genes are mainly related to the immune system and olfactory transmission.4.We identified 34,249 introgressed fragments from other wild Bos species in the 20 Chinese indicine high-quality haplotype genome,and we next analyzed the distribution of tree topologies of each introgression fragment across several species belonging to the Bos genus.We encountered five tree topologies among our introgressed fragments,revealing introgression fragments in Chinese indicine derived from five different wild Bos species.We found that 3-16% sequences of each Chinese indicine genome were from alien genome,and an average of 3.8%,3.2%,1.4%,0.5%,and 0.6% of each individual’s genome was assigned to banteng-like,kouprey-like,gayal-like,gaur-like,and unknown origin,respectively.Second,we used the length distribution of introgression fragments from different donor species to infer the introgression history,and the results showed that the Chinese indicine population had five waves of introgression events from different wild Bos species.The admixture pulses with banteng-like,kouprey-like,gayal-like,gaur-like,and unknown origins were estimated to have occurred 532-560,763-807,656-713,912-1,039,and 1,517-1,721 generations ago,respectively.The youngest of these inferred admixture pulses—that from the banteng-like source—occurred 3,360-3,192 ya,assuming a generation time of 6 years.In addition,the nucleotide diversity was significantly higher in Chinese indicine than other domestic cattle populations,and the number of variants in the introgressed regions was significantly higher than those in the non-introgressed regions.Introgressed regions of the genome that had two haplotypes with shared inferred ancestry showed an average sequence difference of 1.92 – 2.23 variants per kb,whereas regions with haplotypes of different inferred ancestry had a higher average difference of 4.24–6.57 variants per kb,again indicating that multiple Bos donors were involved in the admixture.Taken together,these observations indicate that Chinese indicine cattle experienced extensive admixture with the surrounding species of the Bos genus,which was instrumental for the increased genetic diversity in this cattle population.5.In order to complement the above SNP-based introgression results,we used the Chinese indicine population structural variation set as a reference panel to genotyped 394 domestic and wild Bos genomes with Illumina reads with a graph-based approach,and obtained 114,387 reliably genotyped autosomal SVs.By combining the results of introgression fragments above and the presence or absence of SVs in wild Bos species,we identified a total of 3,136 Chinese indicine population-specific SVs introgressed from other Bos species,among which we uncovered a 6.3 kb insertion downstream of the ASIP gene.The high-frequency adaptive introgression insertion may be related to the tan coat color of Chinese indicine.In addition,we generated blood transcriptome and whole-genome sequencing data from 19 Pi Nan cattle for allelic differential expression and SV-e QTL analysis,further confirming that some introgressed SVs affect gene expression,and the genes whose expression changes are mainly related to disease resistance and energy metabolism.Taken together,these results may partly explain the effect of introgressed SVs on gene expression and might contribute to the environmental adaptability for Chinese indicine cattle. |