Font Size: a A A

Research On Assembly Method Based On Metagenomic Sequencing Data

Posted on:2019-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:A Q ZhangFull Text:PDF
GTID:2428330566998536Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the rapid development of sequencing technology promotes the study of gene sequence.High-throughput sequencing has the advantages of high throughput and using less time,but the sequences length is not long enough,scientists couldn't directly use these reads to do subsequent analysis and research.It need to use assembly tools to assembly sequencing fragments.The metagenomics assembly algorithm based on high-throughput sequencing data is one of the core research directions of metagenomics.The existing assembly methods emerge in an endless stream,but the general assembly tools consume a long time and occupy a large amount of memory,and the coverage rate is not high enough.This topic will be to improve these issues,in-depth study of the metagenome assembly question.The main work of this paper is based on MEGAHIT assembly tool,adding new algorithms to deal with the corresponding problems.In this study,I design experiments on MEGAHIT to find out the research direction of improvement.Using the advantages of Succinct de Bruijn such as taking less memory space and less running time to construct the whole graph.Analysis of the structure of the graph,according to the characteristics of each structure and the factors that cause the structure,the whole graph is further simplified.Using the single-end reads to build a Random Forest model to extend contigs,the Random forest model is built once,and put it in the further assembly,it doesn't need to be built before assembly every time.It solves the problem the complex branches due to the lack of kmer caused by low coverage.At the last step of assembly,dividing the graph to make every subgraph belongs to one species,and output the consensus contigs for every subgraph,it can solve the problem that caused by similar subspecies contigs makes contigs couldn't extend longer.At the experiments part,using simulated datasets which have reference genome sequences to do experiments,using Meta Sim to simulate two datasets with highly uneven sequencing depths and even sequencing depths respectively.Based on the two datasets,analysis the experiments results to show that comparing with the existing assembly tools.Using real dataset to process experiment,we don't know it's reference genomes.For this part,I just give the statistic result,no evaluate results about the accuracy.the method of our study doesget better results.Through the analysis of the experimental results,it demonstrates that the algorithm of this study did get better results.
Keywords/Search Tags:microorganisms, metagenomics, high-throughput sequencing, assembly, contigs
PDF Full Text Request
Related items