Research On Assembler Based On De Bruijn Graph For Metagenomic Data

Posted on:2016-11-17

Degree:Master

Type:Thesis

Country:China

Candidate:Z P Hu

Full Text:PDF

GTID:2180330464953811

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Microbial population contains a large number important information of the interpretation of human health, natural evolution and ecological problems, high-throughput sequencing technology makes it possible to obtain this information. By sequencing the sample data of the microbial community, the sequenced fragment metagenome data, accurately classified sequencing fragments to obtain true information of microbial population provides an important condition, so as to ensure the accuracy and efficiency of metagenomic studies. In recent years, Metagenomic sequence’s calculation problem by using DNA sequencing fragment data is deeply concerned, this paper studies on the issue.Due to most of the microbial genetic data are unknown, according to the characteristics of data, this paper proposed the CLUSTERH method of Metagenome sequences spliced from scratch which based on De Bruijin. Firstly, The CLUSTERH method will be decomposed into K-mers to build the De Bruijn diagram, and then remove the sequencing error by adjusting the K value. Secondly, due to genetic similarity between different species than the same genes between species, subspecies area less similar regions, based on this idea, CLUSTERH by attempting to remove De Bruijn figure cr-branch, the De Bruijn diagram divide into a set of orphaned Sub-graph, where each sub-graph represents one species or subspecies of a plurality of species; Finally, CLUSTERH to obtain the gene sequence of species by multiple sequence alignment.Testing and analyzing the biological data which released by US National Center for Biotechnology Information NCBI website, the results show that, CLUSTERH algorithm for sequencing data carrying and without mate-pair fragment, can obtain higher accuracy of metagenomic assembly sequence. Also, for non-mate-pair sequencing data fragments, CLUSTERH algorithm parameters settings, can obtain more Meta-IDBA algorithm is more accurate sequence assembly, effectively relaxed the mate-pair sequencing data segment requirements, which can further reduce the cost of sequencing, and more practical algorithm. Also, for non-mate-pair fragments sequencing data, on different parameters seeting, CLUSTERH algorithm can obtain more accurate sequence assembly than Meta-IDBA algorithm, effectively relaxed the requirements of mate-pair fragment’s sequencing data, which can further reduce the cost of sequencing, and more practical algorithm.In view of the CLUSTERH method, design and implementation of metagenomic sequence assembly testing software package. This software package uses C++ language for development, and running on Linux 64-bit operating system. It main including Parameter settings, read the biometric data, metagenomic assembly, view the results and analysis of five modules. According to the specific circumstances, the parameter setting module set the biological data format, choose whether to use mate-pair information, file path and so on. Read the biometric data is means read sequencing fragments of data from text files, the data file is fasta format. During the process of Metagenomic assembly, it can be dynamic display partition diagram to the side, the final result is stored in an external file, and provides an analysis of the validity of the results.In conclusion, this paper research the solving algorithm of Metagenomic sequence assembly problem, put forward the effective algorithms and achieved good effect assembly, provides a better idea and method for solve Metagenomic data assembled problem.

Keywords/Search Tags:

Metagenomic, Assemble, Clustering, Algorithm, De Bruijin diagram

PDF Full Text Request

Related items

1	Research Of Metagenomic Contigs Clustering Method Based On Improved Density Peaks
2	Algorithm Research And Application Of Multi-core Voronoi Diagram In The Plane
3	Research Of Fuzzy Clustering Method On Imbalanced Dataset And Its Application In Metagenomic Contigs Binning
4	Research On Density Peaks Clustering Algorithm Based On DNA Microarray Data And Its Application
5	Complex Network Clustering Based On Multi-objective Optimization Algorithm
6	A Dynamic Clustering Algorithm By ASA And Its Validity
7	QTM-based Spherical Voronoi Diagram Generating Algrotihms And Its Application
8	Research And Application Of Fuzzy Clustering Based On Tissue Like P-system
9	Power Figure Scan Generation Algorithm
10	Size-controllable DNA Nanoribbons Assembled From Three Types Of Reusable Brick Single-strand DNA Tiles