Font Size: a A A

Density estimation and modal based method for haplotyping and recombination

Posted on:2011-03-29Degree:Ph.DType:Dissertation
University:The Pennsylvania State UniversityCandidate:Mao, XianyunFull Text:PDF
GTID:1460390011971353Subject:Biology
Abstract/Summary:
Genetic problems such as haplotype inference and recombination analysis are rarely studied using nonparametric models. We present here some new methods based on kernel density estimation and a modal expectation-maximization (MEM) method for analyzing genetic data. We also use a degree of freedom (DOF) calculation for bandwidth selection and diagnostics.;For the problem of inferring haplotypes from genotypes, we construct a likelihood function that depends on the unknown haplotype density. We then apply a likelihood EM to a naive initial estimator to create an updated density that has higher likelihood. The density is then used to find the most likely haplotype pairs for any genotype. The performance of the method is tested on simulated data and small sets of real data. To improve the performance of our method for large data (∼1,000 individuals, 10,000 sites), we develop degrees of freedom (DOF) as a diagnostic tool to partition large data. We then use MEM to solve each partition and to merge the solutions. We show that the new method yields comparable performance to available methods both in time and in accuracy. In a similar fashion, we can define a density estimator for binary sequences (haplotypes) in the presence of recombination and mutation. With the new density, one can estimate the probability of recombination for given sites.
Keywords/Search Tags:Density, Recombination, Method, New
Related items