Font Size: a A A

An Optimization Of The Assembly Strategy Of Animal Mitochondrial Metagenomics

Posted on:2021-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:J DongFull Text:PDF
GTID:2480306605494464Subject:Agricultural Entomology and Pest Control
Abstract/Summary:PDF Full Text Request
Mitochondrial metagenomics(MMG),also called as mito-metagenomics,is a specific form of metagenome skimming.It targets the mitochondrial fraction of bulk specimens with high-throughput sequencing,and subsequently extracts mitochondrial sequences using bioinformatic analysis.MMG provides oppotunities to rapidly obtain mitochondrial genomic data of a large amount of species in an efficient and economical manner.MMG reduces errors introduced by PCR bias,and provides a new methodology for the study of biodiversity and phylogeny.As a new and rapidly expanding field,most MMG studies to date have not gone beyond the proof-of-concept stage.It still requires to standardize study design,and a rigorous evaluation and optimization of key parameters.Traditional MMG data generation process requires a large amount of computational resources.In this study,we presented a novel MMG pipeline for the rapid mitogenome assembly,integrating a fast,accurate read mapper for filtering non-mitochondrial reads,a seed-and-extend assembler for assembling speciesspecific mitogenomes while detecting 'noisy' species/sequences potentially obstructing target assembly.We tested the pipeline on both simulated and real datasets to evaluate the completeness and accuracy of the pipeline.The main text is divided into four chapters.In the first chapter,the proposal and development from DNA barcoding and DNA metabarcoding to MMG,the summary about advantages and disadvantages of different strategies for mitochondrial genome sequencing,the experimental design,challenges and applications of MMG were introduced.The purpose of this study was proposed.In the second chapter,a novel and efficient assembly strategy for MMG was proposed,including filtering non-mitochondrial reads against a mitogenome database of closely-related species and then assembling mitochondrial reads for different species separately.Traditional blastn pairwise sequence aligning algorithms are usually time-consuming and not suitable for aligning very divergent sequences.Therefore,we used NextGenMap as a better alternative to map raw sequencing data to a mitogenome database of related species rapidly,which was downloaded from NCBI Reference Sequence Database.Then we extracted candidate mitochondrial reads with SAMtools to reduce raw data by one order of magnitude(?10%).Mitogenomes were assembled by "seed-extend" algorithm in NOVOPlasty assembler respectively.And assemblies are iteratively corrected until no noisy regions including chimera were detected.We tested three simulated species-mixed datasets with COI divergence of 0.090-0.377,0.036-0.289 and 0.015-0.166(DS_A,Metazoa;DS_B,Hymenoptera;DS_C,Culicidae).Mitogenome assembly lengths(bp)were 14,361.2±3,182.7(3,876-17,633),8,510.2±5,241.2(630-17,211)and 15,582.4±445.5(15,377-16,673)for DS_A,DS_B,DS_C,respectively,and corresponding genome coverage(%)relative to reference as 95.1 ± 20.1,58.2±34.2,100±0.1.The MMG filtering and assembly computational process for each dataset performed less than 24 hours on a desktop computer(8 cores/16 threads,32G memory).Chimera detection and removal in our pipeline eliminated most errors but still failed in two mosquito species of 1.5%COI divergence.We also tested the species detection rate and accuracy of species abundance under varying sequencing coverage on simulated data including only closely related species,and concluded that 1 Gbp data for a single species-miexed sample would be enough to detect species occurrence and their abundance in most real cases.In summary,our study provided a fast,accurate MMG assembly strategy,and validates the great power of read-based approach in biodiversity quantification and measurement.In the third chapter,we evaluated the performance of the novel assembly strategy for the MMG in specific examples.We tested three samples(DB,Isotomidae;33mix,Panbenthos;DBTR,Pan-soil fauna)of pooled specices with the MMG assembly strategy proposed in chapter two.Mitogenome assembly lengths(bp)were 14,078.0±3,758.4(28416,812),13,532.0±4,380.6(254-18,095)and 15,585.8±2,500.6(6,000-22,026).Most mitogenomes contain the complete mitochondrial gene set,i.e.13 protein-coding genes(PCGs),22 transfer ribonucleic acids(tRNAs),and 2 ribosomal ribonucleic acids(rRNAs),therefore,the number of PCGs with the most abundant genetic information reached 12.89±0.31(12-13),12.43± 1.50(8-13)and 12.39± 1.72(5-13),further verifying the integrity of the strategy result.The accuracy of assembled results(%)was demonstrated by the recognition of the assembled sequence with the standard barcode reference(COI-658 bp)of each species,which was 99.21±0.91,99.21±0.91 and 99.46±3.25 for datasets DB,33mix and DBTR,respectively.Only three species of DB and two of DBTR datasets produced chimera fragments,which were eliminated by our pipeline.We also summarized the basic information of the base content and G-C skew of each sequence,and showed partial structural sketch of mitogenomes.There is no significant differences of completeness and accuracy between the results of chapter two and chapter three.All chimeras in different datasets were compared,and it was considered that the COI divergence of each specimen should be more than 10%in real pooling step,so as to avoid possible errors caused by closely related species.In summary,our study validated the feasibility of the new assembly strategy of MMG through real data sets and provides a technical basis for the construction of a future "super barcode"(mitogenome)database.In the fourth chapter,we summarized the main research results of this paper,and prospectd the application of MMG technology in the development of batch annotation function,construction of mitogenome database,species detection and biodiversityassessment.MMG will be applied in a wider scope for ecological and evolutionary studies with the decreasing of sequencing cost.
Keywords/Search Tags:biodiversity, mitochondrial metagenomes, high-throughput sequencing, genome assembly, species detection
PDF Full Text Request
Related items