Font Size: a A A

Method Study For Classification Of Metagenomic Samples

Posted on:2018-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:F D ChengFull Text:PDF
GTID:2310330542452825Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Metagenomics,also being referred to as environmental genomics or community genomics,is the study of genetic material recovered directly from environmental samples.Nowadays,metagenomics has become a common method of microbial research because of the fact that most microbes could not be cultured independently.Metagenomic data consists of sequences from different microbial genomes,which makes the data analysis more difficult than that of single genome.With the development of high-throughput sequencing technologies,the research on metagenomic classification methods has become a hot spot because of the surging data of metagenomes.In this paper,we focused on the classification of human gut metagenomic data,developed two different algorithms,and tried to determine the disease phenotypes of the hosts according to the analysis of the gut metagenomes.The details of this work could be summarized as three parts.1.Classification features of metagenomic samples has been studied,and a new self-alignment feature has been proposed.In this article,a new feature called Intrinsic Correlation of Oligonucleotides(ICO),which can distinguish among microbial species,was found,and utilized for not only the single species,but also the metagenomic samples.Meanwhile,we introduced a new self-alignment feature of metagenomic samples,which would help classify metagenomic samples.2.A classification algorithm named DectICO was built based on ICO.This method,using ICO to characterize the samples,is combined with the Kernel Partial Least Squares(KPLS)feature selection algorithm to form a dynamic filtering mechanism for the potential features,and employs Supporting Vector Machine(SVM)to complete an accurate classification of the metagenomics samples.Six simulated data sets of metagenomic sequences and one real-world data set were utilized to evaluate the classification performances of DectICO.And the results showed that the DectICO performs well for the classification of complex data sets,and is superior with longer oligonucleotides as the characteristics.In addition,selection of ICO by the dynamic KPLS algorithm could promote the classification performance significantly.Finally,a comparison of performance between DectICO and the RSVM-based(Recursive Support Vertor Machine)classification algorithm was performed,and the result demonstrated that DectICO has a better performance in accuracy of metagenomic samples classification,as both methods are supervised classification algorithm based on sequence features.3.A new classification method based on self-alignment feature was proposed,which extracts the characteristics of sample classification attributes by exploiting contigs(contiguous genomic stretches comprised of overlapping reads)that assembled by reads(the short sequences)from the original data set.In this algorithm,three concepts,the Comparison Database,the Match Score of a sample,and the Independent Score of a contig were proposed,and a self-alignment database was established and optimized.The method was evaluated by a Diabetes mellitus type 2(T2D)data set,and showed a better performance than the DecICO and RSVM-based methods,which proved that the classification matching score and independent statistical score of the contigs are effective self-alignment features.
Keywords/Search Tags:metagenome, sequence feature, sample classification algorithm
PDF Full Text Request
Related items