Font Size: a A A

Novel strategies to increase read length and accuracy for DNA sequencing by synthesis

Posted on:2011-02-04Degree:Ph.DType:Thesis
University:Columbia UniversityCandidate:Yu, LinFull Text:PDF
GTID:2464390011972155Subject:Engineering
Abstract/Summary:
The completion of the Human Genome Project has increased the need for high-throughput DNA sequencing technologies aimed at uncovering the genomic contributions to diseases. The DNA sequencing by synthesis (SBS) approach has shown great promise as a new platform for decoding the genome. This thesis focuses on the development and improvement of a chip-based four-color DNA SBS platform using molecular engineering approaches. In this approach, four nucleotides (A, C, G, T) are modified as fluorescent nucleotide reversible terminators (CF-NRTs) by tethering a cleavable fluorophore to the base and capping the 3'-OH with a small chemically reversible moiety so that the nucleotide analogues are still recognizable as substrates by DNA polymerase. First, we explored the potential of using an azido modified group for nucleotide modification. Based on our established rationale for nucleotide reversible terminator (NRT) design, we synthesized a complete set of NRTs capped at the 3' position with an azidomethyl group (3'-O-N3-dATP, 3'-O-N3-dCTP, 3'-O-N3-dGTP, 3'-O-N 3-dTTP). Through testing and optimization, it was apparent that these NRTs were good substrates of a DNA polymerase. Afterwards, we worked out an optimum chemical cleavage condition to remove the azidomethyl group capping the 3'-OH of the nucleotide analogues under conditions that were compatible with DNA, allowing the next NRT to be incorporated in the subsequent polymerase reaction. We then designed and synthesized two sets of azido-modified CF-NRTs for applications in SBS. The four CF-NRTs of the first set (3'-N 3-O-dNTP-azidomethylbenzoyl-fluorophores) were capped at the 3'-OH with an azidomethyl group identical to the NRTs and contained a substituted 2-azidomethylbenzoyl linker to tether a fluorophore. These CF-NRTs were used to produce four-color de novo DNA sequencing data on a chip based our sequencing by synthesis approach. After each round of sequencing, both the fluorophores linked to the CF-NRTs and the 3'-azidomethyl group on the DNA extension products generated by incorporating 3'-O-N 3-dNTP-azidomethylbenzoyl-fluorophores were removed using a TCEP [Tris(2-carboxyethyl)phosphine] cleavage solution. This one-step dual-cleavage process for reinitiating the polymerase reaction increased the overall SBS efficiency. After confirming the feasibility of implementing azido-modified CF-NRTs in SBS, we synthesized a second set of CF-NRTs (3'-O-N3-dNTP-N3-fluorophores) to further improve and optimize the sequencing process. During the incorporation stage of SBS, a mixture of CF-NRTs and NRTs was used to simultaneously extend the primer strand of various target DNA linear templates. This approach led to a more efficient DNA polymerase reaction since the smaller 3'-O-N 3-dNTPs were much easier to incorporate. Moreover primers extended with NRTs resembled nascent strands of DNA that had no traces of modification after cleavage of the 3'-azidomethyl capping group. After the incorporation reaction, two separate capping steps, first with 3'-O-N3-dNTPs and then with ddNTPs, were performed to synchronize all the templates on the surface. Without these precautionary synchronization procedures, mixed fluorescent signals would prevent the identification of the correctly incorporated nucleotide. Hence, we have successfully addressed one of the key drawbacks of SBS, which was the miscalling of the base due to lagging signals. In addition, since both 3'-O-N3-dNTP-N3-fluorophores and 3'-O-N 3-dNTPs were reversible terminators, which allow the sequencing of each base in a serial manner, they could accurately determine the homopolymeric regions of DNA. Finally, we developed a novel template walking strategy to increase read length for DNA SBS. The template walking method involved resetting the sequencing start site by extending the sequencing primer with three natural nucleotides and one NRT so that the polymerase reaction was temporarily paused when the NRT was incorporated. Upon restoring the 3'-OH group of the NRT incorporated into the primer via cleavage, the next cycle of walking could be carried out until the entire previously sequenced portion of the template was skipped. We have successfully demonstrated the integration of this template walking strategy into our four-color DNA SBS platform by performing one round of SBS, four cycles of template walking reactions, and then a second round of SBS. Through this effort, we were able to sequence a linear DNA template in its entirety, nearly doubling the read length of our previous sequencing results. We are also taking advantage of the massive throughput of a next generation sequencer that is based on our SBS technology to conduct digital gene expression study of Aplysia central nervous system in an ongoing project that explores the molecular mechanism of long-term memory formation.
Keywords/Search Tags:DNA, SBS, Read length, NRT, Polymerase reaction, Template walking, Cf-nrts, 3'-O-N
Related items