Font Size: a A A

A Single Molecule and Long-Read View of the Human Transcriptome

Posted on:2014-07-08Degree:Ph.DType:Dissertation
University:Yale UniversityCandidate:Sharon, Donald EdmundFull Text:PDF
GTID:1458390005999312Subject:Biology
Abstract/Summary:
Improving our understanding of alternative splicing is a primary goal in the field of transcriptomics. While prokaryotes' genes are usually encoded in continuous stretches of RNA, eukaryotic genes often consist of a series of exons interspersed with untranslated intronic sequences. At points during and after transcription those exons can be rearranged to yield many different isoforms. While current RNA-sequencing methods rely on short reads (∼100bp) to identify individual splice junctions that may offer hints to the final sequence of the mRNA, there has been no effective or unbiased means to capture the full complexity of long RNAs containing multiple exons. Pacific Biosciences (Menlo Park, CA) has introduced a third generation high-throughput sequencing technology that is capable of capturing the full complexity of extremely long RNAs. PacBio's single molecule, real time (SMRT) sequencing uses nanoscopic holes or "zero mode waveguides" (ZMWs) to isolate signals from base incorporation events by individual polymerase molecules. I have used this platform to sequence unfragmented poly-A selected cDNA libraries from a complex mixture of RNAs derived from 20 different human tissues, HapMap trio cell lines, as well other eukaryotic organisms. The long reads are free of the sequence specific errors that have plagued second generation platforms. It has allowed us to identify a novel class of spliced long intergenic non-coding RNAs (lincRNAs), phase full-length isoforms and resolve complex genes that would be difficult for a short-read platforms to handle. We have also used these long-reads as a "gold-standard" to compare to in silico predictions of isoforms made using short-read sequencing data from the Illumina Hi-seq platform. While isoforms can be predicted based on the short data, many splice variants identified by the PacBio RS are not able to be reconstituted using such methods. Long-read RNA-seq holds great promise for elucidating the role of alternative splicing in many biological processes and is a boon for RNA research involving model organisms that do not yet have a draft genome. This work has demonstrated the benefits of single-molecule, long-read RNA-seq as a novel tool for investigating eukaryotic transcriptomes.
Keywords/Search Tags:Long-read
Related items