Font Size: a A A

Research On Assembler Based On De Bruijn Graph For Metagenomic Data

Posted on:2016-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z P HuFull Text:PDF
GTID:2180330464953811Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Microbial population contains a large number important information of the interpretation of human health, natural evolution and ecological problems, high-throughput sequencing technology makes it possible to obtain this information. By sequencing the sample data of the microbial community, the sequenced fragment metagenome data, accurately classified sequencing fragments to obtain true information of microbial population provides an important condition, so as to ensure the accuracy and efficiency of metagenomic studies. In recent years, Metagenomic sequence’s calculation problem by using DNA sequencing fragment data is deeply concerned, this paper studies on the issue.Due to most of the microbial genetic data are unknown, according to the characteristics of data, this paper proposed the CLUSTERH method of Metagenome sequences spliced from scratch which based on De Bruijin. Firstly, The CLUSTERH method will be decomposed into K-mers to build the De Bruijn diagram, and then remove the sequencing error by adjusting the K value. Secondly, due to genetic similarity between different species than the same genes between species, subspecies area less similar regions, based on this idea, CLUSTERH by attempting to remove De Bruijn figure cr-branch, the De Bruijn diagram divide into a set of orphaned Sub-graph, where each sub-graph represents one species or subspecies of a plurality of species; Finally, CLUSTERH to obtain the gene sequence of species by multiple sequence alignment.Testing and analyzing the biological data which released by US National Center for Biotechnology Information NCBI website, the results show that, CLUSTERH algorithm for sequencing data carrying and without mate-pair fragment, can obtain higher accuracy of metagenomic assembly sequence. Also, for non-mate-pair sequencing data fragments, CLUSTERH algorithm parameters settings, can obtain more Meta-IDBA algorithm is more accurate sequence assembly, effectively relaxed the mate-pair sequencing data segment requirements, which can further reduce the cost of sequencing, and more practical algorithm. Also, for non-mate-pair fragments sequencing data, on different parameters seeting, CLUSTERH algorithm can obtain more accurate sequence assembly than Meta-IDBA algorithm, effectively relaxed the requirements of mate-pair fragment’s sequencing data, which can further reduce the cost of sequencing, and more practical algorithm.In view of the CLUSTERH method, design and implementation of metagenomic sequence assembly testing software package. This software package uses C++ language for development, and running on Linux 64-bit operating system. It main including Parameter settings, read the biometric data, metagenomic assembly, view the results and analysis of five modules. According to the specific circumstances, the parameter setting module set the biological data format, choose whether to use mate-pair information, file path and so on. Read the biometric data is means read sequencing fragments of data from text files, the data file is fasta format. During the process of Metagenomic assembly, it can be dynamic display partition diagram to the side, the final result is stored in an external file, and provides an analysis of the validity of the results.In conclusion, this paper research the solving algorithm of Metagenomic sequence assembly problem, put forward the effective algorithms and achieved good effect assembly, provides a better idea and method for solve Metagenomic data assembled problem.
Keywords/Search Tags:Metagenomic, Assemble, Clustering, Algorithm, De Bruijin diagram
PDF Full Text Request
Related items