Font Size: a A A

Computational and statistical approaches in RNA sequencing analysis

Posted on:2010-05-09Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Jiang, HuiFull Text:PDF
GTID:1440390002487649Subject:Biology
Abstract/Summary:
The recent development of next-generation sequencing technology has brought new opportunities and challenges to the field of bioinformatics. Combining sequencing with other experimental techniques, a number of approaches have been developed to help explore the complex biological systems, among which RNA sequencing (RNA-Seq) is considered to be a very powerful tool for transcriptomics. For RNA-Seq data processing, we developed a tool called SeqMap, which can find all the places in a reference genome where each sequencing read may come from. It can map millions of sequencing reads to a genome of several billions of nucleotides in a few hours on a desktop PC. In the mapping, multiple substitutions and insertions/deletions of the nucleotide bases in the sequences can be tolerated and detected. For RNA-Seq data analysis, we developed a statistical method for the estimation of isoform-specific gene expression with massively parallel RNA sequencing. We count the number of reads falling into each exonic region of a gene and use a joint Poisson model to study the joint distribution of the counts. It can be shown that under certain assumptions the log-likelihood function of the joint distribution of the parameters is always concave so that the maximum likelihood estimate (MLE) can be solved easily with simple iterative methods. We developed a Bayesian method based on importance sampling from the posterior distribution to draw the inferences of the parameters. Finally, we investigate the extension of our statistical method to paired-end RNA-Seq.
Keywords/Search Tags:Sequencing, Statistical, Rna-seq
Related items