Font Size: a A A

Research And Application Of Metagenomic Classification And Analysis Method

Posted on:2016-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:X LuoFull Text:PDF
GTID:2180330503977302Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
With the development of environmental microbiology research and the emergence of high-throughput sequencing technology, metagenomics provides a new research method for the study of microorganism. Breaking the bottlenecks of traditional microbiological methods, metagenomics analyze the genomes of environmental microbiology directly. In recent years, numerous studies have shown that variety of human diseases are bound up with microbial communities, and metagenomic sample classification method is an important research tool to reveal the relationship between the host or the environment and microbial communities relationship:By extracting features of metagenomic samples, and combined supervised classification algorithm to identify the sample class. Currently whole genome sequences of microorganisms were used in most metagenomic sample classification method, this paper studied the 16S rRNA gene sequence based analysis method of microbial community, and set up 16S rRNA gene sequence based sample classification processes, and the sample classification process is applied to the study of mice and human gut microbes data.The prerequisite of sample classification is extracting sample features which can distinguish different states of samples. In this paper, we studied 16S rRNA gene sequence of different samples, and verified the feasibility of sample features extracted from community structure using simulated data. The results show that species richness which contains the number and ratio of species of microorganisms in the sample is the most basic sample feature; a diversity is the refinement of species abundance, reducing the dimension of sample features, is an important sample feature; β diversity combined the independent evolution information of community (UniFrac) with species abundance, is an ideal sample feature.Combined with Random Forest algorithms and three effective sample features, we have set up a sample classification process based on 16S rRNA gene sequences. Through the classification experiments on simulated data sets with different parameters, we compared the effects of different parameters on the classification process accuracy, which include number of categories, infra-class variance, inter-class variance and phylogenetic tree height. The final classification results show that even in the case of the difference between infra-class features is not obvious, namely under the condition of small infra-class variance and large inter-class variance, the classification accuracy of our process is higher than other classification methods. Our process also showed a high classification accuracy in the case of complicated evolutionary relationship. Experimental results show that our sample classification process has good classification performance, can accurately identify metagenomic samples based on 16S rRNA sequences.The metagenomic sample classification processes based on 16S rRNA gene sequence were applied to gut microbes samples of mouse and human. The classification result of environment-related mouse gut microbes sample showed that the classification accuracy more than 88%. And we concluded that there has more misclassification when the infra-class variance of two class of samples is small; the species evolutionary relationships can reflect the difference of gut microbes sample of mouse under different environment more clear than the information of community independent evolution(UniFrac). The classification result of obesity-related human gut microbes sample showed that the classification accuracy more than 75%. And we concluded that the misclassification often happens because the feature vector variance between rationalization between obesity group and overweight group is low; as the sample feature, the information of community independent evolution perform better than microbial species evolution on obesity-related human gut microbes sample, we believe the information of community independent evolution can do better to reflect the differences of gut microbes in people have different BMI. We made a summary on this two groups of experimental conclusion:firstly, the influence of on infra-class variance on classification accuracy is great, there will be more misclassification if the infra-class variance of two class of samples is small; secondly, the performance of our process is better than classification based on support vector machine; and finally, for different samples 16S rRNA sequencing data, our classification processes has a higher reliability than MetaPhyl.
Keywords/Search Tags:metagenome, sample classification, 16S rRNA, microbial community analysis, supervised classification
PDF Full Text Request
Related items