Font Size: a A A

Effectively Clustering Reads Of Metagenomes

Posted on:2014-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:R Q LiaoFull Text:PDF
GTID:2180330434970705Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of high-throughput technologies, biology research has generated an unprecedented amount of data to analyze. Metagenomics data are sampled from animal organisms or natural environment such as deep ocean and soil, which contain the genomic sequences of multiple microorganisms. The analysis of metagenomic data can provide valuable insights into problems such as human health, microorganism evolution and microorganism community composition. For that reason, more and more bioinformatics researches are focusing on this area, trying to solve the fundamental problems of metagenomic data analysis.Since most of the metagenomic data contain mixed sequence segments from multiple species, it is necessary to separate them into different classes before further analysis. The seperation of these segments is called binning. Most existing binning methods require known reference genomes to perform the binning of sequences. However, most metagenomic data contain sequences from unknown species. Therefore, an effective unsupervised binning method is needed to solve the problem.In this paper, we presented an unsupervised binning method based on a clustering algorithm that can simultaneously determine the weight of each feature in the clustering process, named MCluster. The method can effectively separate the genomic sequences from different microorganisms into different clusters. Unlike traditional supervised methods, our method can be used without any reference genome information. Experiment results show that our method achieved considerable performance using both simulated dataset and real dataset, which makes our method a promising approach to solve the binning problem of unknown metagenomic data.
Keywords/Search Tags:metagenomics, binning, feature weighting, clustering
PDF Full Text Request
Related items