Font Size: a A A

A Fast And Accurate Sequence Composition Based Metagenomic Sequence Classification System For Unknown Environmental Samples

Posted on:2019-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y QiaoFull Text:PDF
GTID:2480305891976129Subject:Biological information
Abstract/Summary:PDF Full Text Request
Background: Many methods have been developed for metagenomic sequence classification,and most of them depend heavily on genome sequences of the known organisms.A large portion of reads may be classified as unknown,which greatly impairs our understanding of the whole sample.Result: Here we present MetaBinG2,a fast method for metagenomic sequence classification,especially for samples with a large number of unknown organisms.MetaBinG2 is based on sequence composition,and uses GPUs to accelerate its speed.A million 100 bp Illumina sequences can be classified within four minutes with GPU card.We applied MetaBinG2 to the dataset of Meta SUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.Conclusion: Compared to existing methods,MetaBinG2 is fast and accurate,especially for those samples with significant proportions of unknown organisms.MetaBinG2 is available at http://cgm.sjtu.edu.cn/MetaBinG2Web.
Keywords/Search Tags:Metagenome, MetaSUB, sequence classification, unknown environment, GPU
PDF Full Text Request
Related items