Font Size: a A A

Taxonomic Classification Species Diversity Of Metagenomic DNA Fragment

Posted on:2014-02-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:T HouFull Text:PDF
GTID:1220330395496624Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Advances in the throughput and cost-efficiency of sequencing technology isfueling a rapid increase in the number and size of metagenomic DNA set beinggenerated. Bioinformatics is faced with the problem of how to handle and analyze alarge amount of these DNA sets in an efficient way. However, The direct sequencedmetagenomes are generally very complex, often consist of DNA fragments fromnumerous genomes possibly from different domains, and most of the DNA sequencesare unknown. Therefore, one of the major challenges in metagenomic data analysis isto predict the taxonomic origin of the DNA fragments. This process is calledtaxonomic classification or binning. Depending on different research needs, thebinning process could be performed on different taxonomic levels from Kingdom (thehighest level) to Species (the lowest level). Up to now, some classifiers have beendeveloped to assess the source organism of DNA fragments from metagenome.However, most of these methods cannot achieve the better classification accuracyrequired by current high-complexity metagenomic sets.In this paper, we presented some methods to identify species diversity ofvariable-length DNA fragments within a metagenome base on some knowledge frompattern recognition. There are three key points to study deeply:(1) Extracting the optimized Composition feature vector from DNA fragmentA large number of genomes sequences have been produced, how to provide ameans to describe and distinguish them accurately is becoming a key issue oftaxonomy. We proposed an efficient algorithm to filter out most genome fragmentsthat are horizontally transferred, and extracted a new genome vector (GV). Tohighlight the power of GV, we applied it to identify prokaryotes and theirvariable-size genome fragments. The result indicated that our new vector as speciestags can represent genome well after filtering out the abnormal genome fragments that are horizontally transferred.(2) Taxonomic Classification DNA Fragment of Metagenome with DS-BinningModelSome classifiers have been developed to assess the source organism of DNAfragments from metagenome. However, the majority of existing classifiers usuallysuffer from the lower classification accuracy at lower taxonomic level. One of thereasons is the classifiers cannot discriminate the data from different organismaccurately, especially the boundary isn’t clear between different organism. To get thebetter classification accuracy, we designed a DS-Binning method to predict thetaxonomic organism of the metagenomic DNA fragments. The method based on theknowledge of support vector data description algorithm and phylogenetic tree. Theresult indicated that the method can avoid some mistakenly identification and leakageidentification.(3) Taxonomic Classification of Metagenomics Data on Species and Genus LevelUsing Weighted SVDD (WSVDD) ModelUp to now, there are several composition-based methods. However at genus andspecies level, most of these Composition-based methods cannot achieve the betterclassification accuracy required by current high-complexity metagenomic sets. Thisdifficulty is highly influenced by several factors such as genome length, reliability ofgenome composition vector and discriminating capability of classifier describing thereference genomic data, etc. We observed that the existing composition-basedclassifiers (such as SVMs, kernelized nearest neighbor, naive Bayes classifier, etc.)cannot describe the genomic data effectively on the noise associated with the lateralgene transfer (LGT) in the reference genomic data. However, as we all know, thereference genomic data (bacterial and archaeal genomes) usually contain a portion ofgenomic fragments from LGT, which prohibit the development of classifiers withperfect accuracy.To overcome the difficulty, we presented a novel strategy to get a better classification accuracy at genus and species level based on weighted support vectordomain description (WSVDD) model. The WSVDD model can overcome theinterference from LGT in training genomic data objectively, therefore the classifierhas a perfect accuracy.We believe that the researches will promote the development of these researches,such as biodiversity, population and evolutionary relationships, functional activity,mutual collaboration relations etc. As well as, the researches will lay an effectivetheoretical groundwork for the development of electronic products for studyingmetagenomic problems in the future.
Keywords/Search Tags:Metagenomics, Metagenome, Taxonomic classification, Supportvector domain description (SVDD), Lateral gene transfer (LGT)
PDF Full Text Request
Related items