Font Size: a A A

Computational analysis of high throughout sequencing data - Applications to DNA and RNA studies

Posted on:2011-04-02Degree:Ph.DType:Dissertation
University:University of VirginiaCandidate:Malhotra, AnkitFull Text:PDF
GTID:1440390002968449Subject:Chemistry
Abstract/Summary:
For a long time, the state of the art DNA sequencing technology was the capillary-based Sanger sequencing technology. However the advent of the high-throughput massively parallel sequencing technologies (MPS) in 2005 has revolutionized the field of genomics. It has given us the tools to investigate a whole genome worth of DNA sequence in a very short period of time for a relatively low cost. Both the cost and amount of sequencing has kept pace with Moore's Law for the last few years, and is expected to continue into the next decade, making the dream of personal genetics possible. It has been a challenge to come up with innovative and optimal solutions to the technical and bioinformatic challenges in interpreting data from MPS. The new molecular methods that form the basis of these technologies introduce new biases that have to be addressed in our analysis. The vast amounts of sequence data generated provide its own statistical and computational challenges.;This dissertation provides a description of my efforts to develop computational and analytical methods to analyze sequencing data from large-scale genomic studies with a focus on understanding molecular basis underlying cancers.;In the first part, I present two methods - AbLink and AbCNV, to study genome wide structural variation using the high throughput sequencing methodologies. AbLink and AbCNV are computational pipelines that rely on the sequencing platforms to investigate the genome and can predict chromosomal rearrangement events such as recombinations, insertions, deletions and inversions as well as genomic loci that are involved in Copy Number Variations (CNV). In the second part of the dissertation, I present a novel method to identify specific gene fusions in multiple patient samples with application to diagnostic and prognostic analyses. In the third part, I present a study of short RNA (sRNA) populations from prostate cancer cell lines to identify a miRNA signature for androgen independence and to identify a new class of sRNAs.
Keywords/Search Tags:Sequencing, DNA, Computational, Data
Related items