The advancements in high-throughput genome sequencing approaches have contributed significantly to uncover structural variations in eukaryotic genomes and help researchers to study inter/intra-genomic and phenotypic variations.In standard procedure of genomes comparative analysis,whole genome resequencing data are usually aligned to an already established reference genome.Recently,with the increased understanding of structural variations,it has been concluded that a single reference genome is insufficient to represent complete genomic contents and diversity for a species.Hence,a general consensus has been developed that reference genome should be replaced with pan-genome to explore and exploit complete genomic contents and genetic diversity of a species.Pan-genome studies revealed that genomes at species level are dynamic and can be further sub-divided into core genome(with genes present in all members of a species)and shell genome(with genes present in subset of members of a species).The shell genome is believed to be the responsible factor of phenotypic variations among members of a species and their evolution.In comparison with eukaryotic pan-genomics,prokaryotic pan-genomics has been very well established.Pan-genomes for prokaryotes have shown many more functionally important genes compared with the genome of a single member of a species.While,prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood.Since the advent of pan-genomics,several tools and pipelines have been introduced for prokaryotic pan-genomics.However,not a single comprehensive pipeline has been reported which could overcome multiple challenges associated with the eukaryotic pan-genomics.To aid the eukaryotic pan-genomic studies,here in first part of present study,we developed a new method and present pps PCP(a plant presence/absence variants scanner and pan-genome construction pipeline),which is designed for eukaryotes especially for plants.It is capable of scanning presence/absence variants(PAVs)and constructing a fully annotated pan-genome.To the best of our knowledge,pps PCP is the first pipeline which is capable of scanning PAVs and constructing a fully annotated pan-genome not only for plants but also for humans,animals and prokaryotes as well.The pps PCP was benchmarked with model cereal species rice and model dicot species Arabidopsis thaliana.The out-put of pps PCP was compared with already published pan-genomes of rice and A.thaliana,which were developed through different pipelines.The results revealed that our developed pipeline pps PCP has better performance in terms of speed,accuracy,sensitivity and CPU/memory usage as compared with other available pipelines.Moreover,in contrast to already available methods for plant pan-genome analyses,if members of a species have been sequenced by different sequencing approaches including short reads and long reads with reasonable coverage and quality,pps PCP would be an appropriate tool to construct pangenome.The pps PCP is freely available at web-page http://cbi.hzau.edu.cn/pps PCP/.Citrus is an important nutritional source for human health and have huge economic value.Therefore,comprehensive Citrus pan-genome analyses have been done in second part of present study.First,a sequence-based pan-genome using our developed tool pps PCP has been constructed for Citrus.Total,11 Citrus genomes were considered in this work including Citrus grandis,Murraya paniculata,Atalantia buxifolia,Poncirus trifoliata,Citrus ichangensis,Citrus reticulata,Fortunella hindsii,Citrus sinensis,Citrus medica,Citrus clementina and Citru unshiu Marc.The developed sequence-based pan-genome of Citrus includes a 620 Mb sequence,44,623 genes and 1,93,753 PAVs addition to the reference genome(C.grandis).Next,a gene-based pan-genome analysis has been carried out to explore core and shell genome of Citrus.Results revealed that almost 4,936 biologically important protein-coding genes are missing from the Citrus reference genome.It was found that Citrus pan-genome have 60.36% core genome and 39.64% shell genome.In addition,comparative analyses have been carried out between core and shell genome.It was found that core genes are longer,have a greater number of exons,have less nonsynonymous/ synonymous substitution ratios,and have more distance to closest upstream transposable elements as compared with the shell genes.Gene ontology analyses revealed that core genes are involved in basic and necessary functions,however shell genes are involved in functions which are important to particular area or environment.To further explore shell genes,ontology network and in silico protein analyses have been carried out.It was found that most of shell genes are involved in the host defense response related functions.A gene ontology(GO)network revealed members of plant phytochromes(Phy)family are highly enriched in defense response network.Protein analyses of Phy family members revealed that Phy B of Citrus has significant structural differences with the A.thaliana.Its GAF domain,knot region,and helical spine show distinct structural differences potentially important for signaling functions.In the third and final part of present study,the mechanisms of transposable elements(TEs)insertion/deletion and their role in Citrus genome’s evolution/diversity was studied.It was found that the insertions of miniature inverted-repeat transposable elements(MITEs)resulted in massive polymorphisms and played an important role in Citrus genome diversity and gene structure variations.It was also observed that insertion and deletion of LTR retrotransposons was accomplished with a dynamic balance and play significant roles in genome diversity of different Citrus species.In conclusion,we believe with unique features of PAV scanning and building a fully annotated pan-genome,pps PCP will be useful for plant pan-genomic studies and aid researchers to study genetic/phenotypic variations and genomic diversity.Furthermore,collectively from Citrus pan-genome and TEs analyses,it is concluded that the PAV sequences and genes identified in this study,highlights the diverse genetic makeup of Citrus with potential utility for future improvement and assistance in developing breeding strategies.The TEs are strong candidate to be responsible for genetic variations of shell genome.The Citrus pan-genome adds depth and completeness to the reference genome and is useful for future biological discoveries and functional studies. |