| Collembola(springtails),having unique morphological characteristics,is widely distributed in terrestrial ecosystems.It is one of the key groups in understanding the evolution of arthropods and the origin of hexapods,and is of important systematic value.The current fourorder classification system is still controversial,with the monophyly and systematic status of several important groups unclear.Molecular studies based on multiple genes did not give a fully resolved phylogeny.With the advances in sequencing technology and the reduction of sequencing costs,phylogenetic studies are increasingly reliant on next-generation sequencing(NGS).Different from traditional phylogenetic studies,phylogenomics mainly use omics data and methods to solve the evolutionary relationships of key groups.It gets the advantages of a large number of information sites and reliable results.However,phylogenomics studies are often restricted by the quantity and quality of specimens,costs,and computing resources.The specimen preservation method and the small body size of Collembola impede the application of popular phylogenomic NGS techniques.Transcriptomic and hybrid enrichment sequencing techniques remain the most prevalent methods for phylogenomic data collection due to their relatively low demands for computing powers and sequencing prices.However,the transcriptome-based method is constrained by the availability of fresh materials and hybrid enrichment is limited by genomic resources necessary in probe designs,especially for non-model organisms.In contrast,whole-genome sequencing(WGS)has significant advantages in material preparation and costs.To promote the application of WGS techniques in phylogeny,this thesis presents a novel WGS-based and computationally efficient pipeline for phylogenomic studies,which has been successfully tested on hexapods.Based on the developed WGS phylogenomics pipeline,it quickly assembles the collembolan genomes,further designs a large number of genomic markers for collembolan phylogeny reconstruction,to finally obtain an objective and comprehensive phylogenetic framework of Collembola.The main results are as follows:The first is the feasibility exploration of phylogenomics from low-coverage wholegenome sequencing.The thesis presents a novel WGS-based pipeline for extracting essential phylogenomic markers through rapid de novo genome assembling from low-coverage genome data,employing a series of computationally efficient bioinformatic tools.The pipeline was tested on 40 insects(genome sizes 0.1–2 Gbp),each genome assembly was completed in 2–24 hours on desktop PCs.It extracted 872–1,615 near-universal single-copy orthologs(Universal Single-Copy Orthologs,USCOs)per species.The pipeline also enables the development of ultraconserved element(UCE)probe sets and extracts the corresponding sequences.In this thesis,it generated probes for 15 species of Phthiraptera,containing 55,030 baits targeting 2,832 loci,from which it finally extracted 2,125–2,272 UCEs.The resulting phylogenetic trees using two types of molecular markers basically agreed with the currently accepted topologies,indicating that markers produced in this pipeline were valid for phylogenomic studies.At the same time,the thesis investigated the effects of sequencing depth on targeted markers extraction success rate based on the raw data of six insect genomes(0.1–1 Gbp),and the results showed that the sequencing depth of 10-20× is sufficient to generate hundreds of targeted markers,20-30× have the best markers extraction effect,and the sequencing depth of UCE can even be as low as 5×.Results presented in this thesis proved the feasibility of conducting phylogenomic and evolutionary studies from low-coverage WGS for a wide range of organisms without reference genomes.This new approach has major advantages in data collection,there is no need to prepare RNA or hybrid enrichment,a single individual as small as 0.3mm can perform the entire pipeline.With low computing consumption,it expands loci choices and many different types of molecular markers can be extracted for downstream analysis.The second is phylogenetic molecular markers design using omics data.Based on the earlier exploration of phylogenetic research using low-coverage whole-genome sequencing data,the thesis developed a practical pipeline for mining genome resources.The pipeline is based on Illumina WGS,employing a series of computationally efficient bioinformatic tools to quickly assemble and annotate genomes,design USCOs and UCEs of the target group,generate compatible format of Benchmarking Universal Single-Copy Orthologs(BUSCO)reference data set,and develop corresponding hybridization enrichment probes for downstream molecular marker extraction and subsequent analysis.Based on the pipeline,it designed collembolan universal molecular marker sets,USCOs,and UCEs,using three published and 11 newly generated genomes.Both marker types were tested in silico via marker capture success and phylogenetic performance.The new genomes were assembled with Illumina short reads and 9,585-14,743 protein-coding genes were predicted with ab initio and protein homology evidence.1,997 universal single-copy orthologues were identified across 14 collembolan genomes and a custom USCO data set for extracting single-copy genes was created and assessed.A new UCE probe set containing 45,815 baits targeting 1,866 loci was also developed,it successfully captured 1,437-1,865 USCOs and 975-1,186 UCEs across 14 genomes.For the genomic marker design for Entomobryoidea,the annotation scheme was improved,20,416-36,765 protein-coding genes were predicted for 8 newly assembled genomes with ab initio and protein homology evidence.Compared with the original annotation scheme,the annotation speed and the number of predicted genes were significantly improved.It identified 3,406 universal single-copy orthologues across 11 entomobryoid genomes,created and assessed a custom USCO data set for extracting singlecopy genes.The newly developed UCE probe set contains 84,156 baits targeting 4,030 loci.It successfully captured 2,404-3,091 USCOs and 3,079-3,481 UCEs across 11 genomes.Phylogenomic reconstructions using these markers were proved to be robust,giving new insight on collembolan relationships.The thesis demonstrates the feasibility of generating thousands of universal markers from WGS,providing a valuable resource for genome-scale investigations in evolutionary biology and ecology.The third is phylogenetics of Collembola.To further study the phylogenetic relationship within Collembola,39 species were selected representing almost all major families of collembolan four orders and rapidly assembled the newly sequenced genomes.Based on the previously developed collembolan universal molecular marker sets,it extracted 719-1,865 USCOs and 620-1,355 UCEs and conducted phylogenetic inference using a diverse set of analytical methods and complete matrices.Final phylogenetic results confirmed the monophyly of the four orders and generated three main topologies.Site-heterogeneous models in Maximum Likelihood and Bayesian inference both reconstructed collembolan phylogeny as(Neelipleona + Poduromorpha)+(Entomobryomorpha + Symphypleona)which was supported by topology tests,but ASTRAL results of USCO datasets only recovered(Neelipleona + Poduromorpha).Estimation of divergence time reveals the basal diversification of living Collembola mainly originated in the Carboniferous,survived the Permian-Triassic extinction events,and achieved differentiation and expansion in the later period.Ancestral character state reconstructions indicated that the ancestors of the living springtails had functional eyes,degraded prothorax,a segmented body,long furcula,and no defensive secretion,a globular body was the result of multiple independent origins.In addition,the study has detected positive selection evidence in 13 protein-coding genes in mitochondria which involved in energy metabolism.The branch of Neelipleona + Poduromorpha underwent selection pressure,which might be closely related to the reduction of eyes,the appearance of the prothorax,and the degeneration of the furcula.It may also be an adaptive evolution of its habitat shift from aboveground to underground in response to changing energy demands.Based on the phylogenomics analysis pipeline proposed in the thesis,genomes of springtails were quickly assembled and annotated,and were used to further develop the USCO and UCE markers specific for springtails.It is the first time that a large-scale genomic data was sampled,and a variety of marker filtering strategies and tree-building methods to reconstruct the phylogeny of the Collembola was applied,which significantly improved the phylogenetic resolution of the deep nodes of Collembola.Divergence time estimation,ancestral character state reconstructions,and positive selection test indicated that there may be an adaptive evolution of habitat migration from aboveground to underground in Collembola.The present study has important theoretical and practical significance in understanding the evolution of Collembola and provides a scientific exemplar in phylogenomic studies of insects or arthropods. |