Font Size: a A A

Statistical designs and algorithms for mapping cancer genes

Posted on:2010-06-02Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Li, YaoFull Text:PDF
GTID:1444390002475716Subject:Biology
Abstract/Summary:PDF Full Text Request
The identification of genes that are directly involved in tumor initiation and maintenance is instrumental for understanding the phenotypic variation of cancer and ultimately designing crucial therapeutic drugs to treat this disease. In recent years, the completed genome sequence of humans and cancers has markedly enhanced cancer gene identification. The overall goal of this dissertation is to develop a warehouse of statistical tools for identifying cancer genes with growingly increasing sequence data. These tools are founded on the latest discoveries for the genetic and developmental roots of cancer formation, including somatic mutations, aneuploid induction, epigenetic modifications, transgenerational imprinting, copy number variants, and host-tumor genetic interactions. New statistical methods and algorithms will be developed to integrate each of these discoveries. By comparing the difference in the DNA structure and sequence between the human and cancer genomes, a disequilibrium model has been formulated to identify and test the genetic mutations or "drivers" that cause cancer. A quantitative model is derived to unravel the aneuploidy control of cancer and estimate the genetic effects of aneuploid loci on cancer risk. Using a commonly used three-generation design, a two-stage hierarchical model is developed to estimate and test the transgenerational alteration of genetic effects and identify genetic imprinting effects due to different parental origins of the same allele. This hierarchical model allows the characterization of genetic interactions between additive and dominant effects and imprinting effects over generations. Cancer susceptibility may be controlled not only by host genes and mutated genes in cancer cells, but also by the epistatic interactions between genes from the host and cancer genomes. A model was derived to estimate genome-genome interactions of host DNA and cancer DNA.;Models for cancer gene identifications require the solution of missing data problems given the fact that cancer genes and their incidence in a natural population cannot be observed directly. For this reason, I have built up the models within the mixture model framework. The maximum likelihood approaches, implemented with the EM algorithm, have been derived to provide the estimates of genetic parameters related to mutation rates, chromosome duplication rates, genetic imprinting, genetic interactions, and haplotype frequencies. I have performed various sets of computer simulation to investigate the statistical properties of the new models in terms of power, estimation precision, and false positive rates. A series of practical computational issues, including convergence rates and choices of initial values, are discussed. I have also formulated various testable hypotheses about the frequencies of genetic mutations and the effects of host genes, cancer genes, and their interactions on cancer susceptibility. This dissertation provides a most complete set of statistical models for cancer gene identification thus far in the literature. The biological relevance and statistical sophistication of these models will make them practically useful to unlock the genetic secrets of cancer.
Keywords/Search Tags:Cancer, Genes, Statistical, Model
PDF Full Text Request
Related items