Font Size: a A A

Haplotyping And Haplotype Frequency Estimation

Posted on:2007-05-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q F ZhangFull Text:PDF
GTID:1118360185951401Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of computer and network technology has equipped molecular biologist with novel powerful methods. Haplotype analysis has attracted great attention of biology and medical researchers for its significance in Iatrology. especially in studies on genetic diseases. However, most organisms, including human beings, are diploid. Since it is both time-consuming and expensive to obtain haplotypes, genotypes, instead, are generated in laboratories. When haplotype information is demanded, we must resort to computational methods to split each genotype to a pair of haplotypes. It is called hapiotyping. We discuss the computational complexities for the hapiotyping problem on different data sets and of different models. We also present a series of efficient hapiotyping algorithms and haplotype frequency estimation methods. The main contributions include:(1) POPULATION DATA HAPLOTYPING Bearing no pedigree information, population genotype data is the most frequent kind of data set. Clark algorithm, PPH algorithm, and EM algorithm. GS algorithm were brought forward for hapiotyping on population genotype data. A newly proposed model, called HMP, that bases on the maximum parsimony rule is studied in this paper. HMP model is very simple yet performs pretty well in practice. However, we prove it to be NP-hard and APX-hard (which means that unless P = NP, HMP cannot be approximated in polynomial time within ratio ]+e for some constant e) as well. This paper presents a polynomial time greedy algorithm and a compound algorithm that combines the greedy policy with the branch-and-bound strategy in a uniform framework. The experimental result shows that the greedy algorithm runs much fast, yet outputs pretty well results. Although it is also complete, the compounded algorithm is much more efficient than the branch-and-bound algorithm and can be applied to instances of much larger scales.Relevant to the Clark Algorithm, Single genotype hapiotyping problem (SGH) can help to better understand the hapiotyping problem. We show that it is NP-complete.(2) PEDIGREE DATA HAPLOTYPING Hapiotyping on pedigree genotype data is believed to be more credible because the pedigree constraint forces genotypes to settle on some haplotype configurations. Current research interests focus on seeking haplotype configurations of minimum recombinant. Here we present a more realistic k-Minimum Recombinant Hapiotyping (k-MRH) model that introduces an additional constraint to balance the distribution of recombinants in the whole pedigree. An optimized dynamic programming that combines a root selection policy is given. Although k-MRH is also NP-hard. the constraint has greatly reduced the solution space and...
Keywords/Search Tags:computational biology, single nucleotide polymorphism(SNP), genotype, haplotype, haplotype analysis, haplotyping, haplotype configuration, maximum parsimony, pedigree, trio, recombination, minimum recombinant, k-minimum recombinant, zero recombinant
PDF Full Text Request
Related items