Font Size: a A A

Assembly And Filtering Of Enriched Data From Exon Capture Across Species

Posted on:2020-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:H YuanFull Text:PDF
GTID:2370330590483563Subject:Biology
Abstract/Summary:PDF Full Text Request
Exon capture is one of the vastly applied methods to generate multi-locus data in phylogenetics of non-model species.It enriches targeted exons using hybridizing RNA oligonucleotide probes(or called “baits”)designed from transcriptome or existing genomes with homologous regions.Lowered stringency in hybridization and washing step enables baits to capture diverged sequences,which allows exon capture across species.De novo based method is more generally applied to assemble data from exon capture across species,since enriched sequences could be too diverged to be mapped to reference sequences.Current de novo based pipelines for exon capture have several problems(1)Size of read files from exon capture is usually large.Some of the recent pipelines input entire read set into assembler,which could make RAM overly occupied.(2)Read depth of data from exon capture could be low,while embedded assemblers in some of the pipelines cannot assemble loci with low read depth.(3)Recent pipelines tend to detect paralogs without comparing sequences to reference genome,while these approaches cannot effectively identify paralogs.(4)Some of pipelines require manual operation.It is labor-exhausting to assemble large number of samples.Thus,we present Assexon: a streamlined pipeline that de novo assembles sequences of targeted exons and their flanking sequences from raw reads.Reads from Lepisosteus osseus(4.37 Gb)and Boleophthalmus pectinirostris(2.43 Gb)were collected to test the performance of Assexon.Reads were captured using baits that were designed based on genome sequence of L.oculatus and Oreochromis niloticus.We compared the assembly performance of Assexon with PHYLUCE and HybPiper,which are commonly used analysis pipelines to assemble UCE and Hyb-seq data.A customed pipeline(abbreivated as “CP” hereafter)that used to assemble data from exon capture was included in comparison as well.Assexon accurately assembled almost twice as number of loci as PHYLUCE and more than a thousand loci than Hybpiper across different level of phylogenetic divergence.but fewer paralogs were recovered.Assexon ran at least twice as fast as PHYLUCE and HybPiper.Assembly performance of CP and Assexon is similar in both tests,while one step in CP requires manual operation.Thus,Assexon can accurately and efficiently assemble reads in large size from exon capture.Assembling results of recent pipelines usually comprise flanking sequences,however,flanking region is rarely incorporated into phylogenetic analysis,which is due to extremely high variabilities in intronic flanks.We developed script(flank_filter.pl)to uncover alignable flanking sequences,which could be helpful to investigate population histories or phylogenetic relationships at shallow taxonomic divergence.We collected exon capture data of 5 individuals of Siniperca chuatsi and 5 individuals of S.kneri from Song et al.to evaluate the variabilities among flanking sequences.At least 2% of pairwise distance(p-dist)exceeed 0.4 among flanking sequences of all Sinipercids,and the highest p-dist was more than 0.8,which suggested that extremely high variabilities existed among intraspecific and interspecific flanks.Then,flanking region was filtered using flank_filter.pl.Highest p-dist decreased to 0.41 after filtering,which means flank_filter.pl can remove flanks with extremely high variabilities.Assexon also includes script to filter pooly aligned coding sequences,scripts to select loci with reliable phylogenetic signal in post-assembly phases.Filtered dataset can be reformatted and input into phylogenetic analysis.
Keywords/Search Tags:Exon captire, phylogenomics, sequence assembling, data filtering
PDF Full Text Request
Related items