Font Size: a A A

Statistical Model On Next Generation Sequencing

Posted on:2011-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:W J TangFull Text:PDF
GTID:2120360308971570Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In the mathematical models of shotgun sequencing, the Lander-Waterman model and the Roach's exact model are the two main ones. The fundamental issues on shotgun sequencing were well resolved in the former model, and a more accurate approach based on the order statistics was proposed in the latter one. However, both the two models were targeted at the traditional sequencing with long reads and low depth of coverage, therefore, only the islands with depth 1, namely the continuous regions covered by at lease 1 read, were discussed in their models. Although their results are still valid in the current Next-Generation Sequencing (NGS), the intrinsic characteristics of NGS on ultra short reads and higher redundancy can not be well reflected in the models above.the most commonly used technique,handling sequencing errors in de nova assembly algorithms which are targeted at NGS based on de Bruijn graph,is to filter erroneous k-tuple by a predetermined threshold. In other words, threshold size is directly related to the results of assembling which are good or not, but how to determine the threshold there is no strong theoretical basis so far.In this paper, we propose a new mathematical model for NGS, which solves the statistical problems of the generalized islands at different levels of depth in shotgun sequencing.Using stochastic process theory, some theoretical results of the model agree well with Monte Carlo simulation results. Therefore, the model can provide theoretical guidance for the NGS and short sequence assembly algorithms.
Keywords/Search Tags:Genomic Sequencing, DNA Fragement Assembly, Shotgun Sequencing, Sequencing technology, Sequencing strategy
PDF Full Text Request
Related items