Font Size: a A A

The Next Generation Sequencing Data Processing

Posted on:2012-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2218330362459250Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, Benefited by the significant development of the Next Generation Sequencing (NGS) technology, more and more companies launched their own sequencing platforms, and instruments has been invented. Such as the Genome Analyzer (Illumian, San Diego, USA), 454-FLX (Roche, Basel, Switzerland) and SOLiD (Applied Biosystems, California, USA) and so on. According to this, gene sequencing has been graduated from the professional lab. Many research groups and researchers are entering this field, and NGS data processing is facing increasing demands and challenges. Researchers have been not satisfied with the basic pipelines provided by the machine manufactures. And many open and flexible NGS data processing pipelines were developed in the past years, such as BING (Kriseman, 2010) and Swift, but they all based on the Illumina's data. In this paper, we carefully reviewed the process of NGS data processing, and design the whole pipeline and algorithms, from gene cluster locating, image registration to base-calling.Among all, we found that the raw data processing part in the existing NGS pipelines are straightforward or even absence. They use general algorithms like level set segmentation or simply Laplace operator for locating the clusters. After carefully analyzing, it was found that these algorithms could not exactly locate the position of each cluster in the fluorography. We redesigned the processing algorithm (NRDPT, NGS Raw Data Processing Tool) and present here.Different with the existing methods, we use edge based Hough transforms to do the cluster positioning, effectively improved the positioning accuracy. And a two-step registration algorithm designed in this paper greatly save the time costs (about 9 times increased). In the base-calling part, existing studies are now based data produced by Illumina sequencing platform.These methods mainly designed to correct the phase disorder problems, which are caused by the biochemical processing. But in some of the new sequencing methods (such as SoLiD, etc.), these problems do not exist. In this article, we discussed these problems and carefully considered several strategies. Then, a well-designed base calling method is descripted, which is based on the reactions used in PSTAR-II and got pretty results.
Keywords/Search Tags:Next Generation Sequencing, DNA sequencing, Image processing, Image alignment, Base calling, Image analysis, Signal processing
PDF Full Text Request
Related items