Font Size: a A A

Integrated analysis of partial sampling techniques in bioinformatics

Posted on:2011-02-24Degree:Ph.DType:Thesis
University:Yale UniversityCandidate:Du, JiangFull Text:PDF
GTID:2448390002952238Subject:Biology
Abstract/Summary:
With the development of microarray and the more recent next-generation sequencing technologies, researchers in genomics have been able to conduct large-scale and high-throughput experiments on the DNA level in order to investigate the abundance of different gene transcripts in the cell, and also to identify structural variants in individual genomes. The biological data from such experiments are usually signal intensities or sequence contents of DNA fragments, which can be viewed as partially observed samples from a pool of complete objects (e.g. short DNA fragments from a mixture of full-length transcript sequences). What is more, these partial samples can be obtained via different technologies, each with its own characteristic error rate, sampling bias and per-sample cost. This thesis describes methods for integrated analysis of such samples in different problems, where computational frameworks and solutions are established to quantitatively parameterize statistical models and efficient algorithms are designed to estimate the variance of the method's accuracy. Both simulation and analytical methods are developed to find the optimal low-cost integration of different sampling techniques in each experiment design. The specific problems being considered include 1) systematically selecting unlabeled DNA regions for validation to train a predictive model, 2) integrated analysis of fragmented DNA sequences to estimate the distribution of full-length gene transcripts, and 3) conducting efficient simulations to model the local de novo assembly process in individual genome re-sequencing. A key aspect of some of the above problems is establishing fast algorithms to compute a corresponding Fisher information based measurement for performance estimation.
Keywords/Search Tags:Integrated analysis, DNA, Sampling
Related items