Font Size: a A A

Optimization Models And Algorithms Of The Haplotype And Genotype Problems

Posted on:2008-01-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1100360242967509Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Single nucleotide polymorphism(SNP) is the most frequent form to address genetic differences, and is the core of determining human disease susceptibility and drug reaction. The nucleotides in a SNP site are called alleles. In a diploid organism, the SNP sequence information on each copy of a pair of chromosomes is called a haplotype. A genotype is a sequence of unordered allele pairs on a pair of chromosomes. In disease association studies, the haplotypes have generally more information content than individual SNPs or genotypes. However, it is substantially difficult to determine haplotypes than to determine individual SNPs or genotypes through experiments. Hence, this paper discusses some problems of haplotypes utilizing SNP data and genotype data, such as haplotype reconstruction and haplotype reconstruction with genotype information. In addition, missing phenomenon and redundant phenomenon exist at large, and these phenomena hinders further analysis and research, so some method for solving them have been discussed in this paper. The main work concerns with the establishment of mathematical models and the construction of optimistic algorithms of haplotype problems. The purpose is to research these problems from computational viewpoint.The disseration includes three aspects, which is summarized as follows:(1) In Chapter III, minimum error correction(MEC) problem and haplotype reconstruction -with genotype information(HRG) are studied. Two mathematical models are established for the two problems, and some properties are proved, such as existence of feasible solution, boundness of objective function and existence of optimal solution. Finally, two heuristic algorithms are constructed. First, construct two kinds of distance functions to measure the similarity degree and the difference degree of SNP fragments, then design a re-clustering algorithm based on the two functions. This algorithm can be performed to solve instances with large size, and overcome some defects of genetic algorithm. In this algorithm, we depend on adding the number of SNP fragments in order to improve the results. However, the number of SNP fragments is sometimes limited, so we further study and discuss HRG problem. Forward neural network(FNN) algorithm for HRG model-the only approximate algorithm, has some defects. This algorithm tends to local optimization, and selection of parameters and initial weights have important effect on numerical results. In addition, obtained haplotype pair is not neccessary to be compatible with given genotype. Hence, we construct a heuristic algorithm based on distance function-Iterative Local-exhaustive Scarch(ILES) Algorithm. Compared with FNN algorithm, ILES algorithm is simple, and it hasn't been effected by various parameters. Finally extensive numerical results prove that our algorithm is more effective.(2) In Chapter IV, transform the tagSNP selection problem based on LD into minimum set-covering problem, and establish a mathematical optimization model. Then establish a heuristic function based on two heuristic factors-the number of elements in a set and the covering degree of a set. Finally construct a algorithm based on the heuristic function. Compared with greedy algorithm, our algorithm still considers another heuristic factor-the covering degree of a set. By experiments, our results prove that the constructed algorithm is superior to greedy algorithm. Although the improvement is trivial, it is enough to prove the effectiveness and the rationality of algorithm. Namely, heuristic algorithm can be improved by adding heuristic factor.(3) In Chapter V, we study the imputation of missing genotype data. First uses mutual information theory to measure dependence of SNP sites. We utilize joint mutual information to compute dependence of SNP sites, and then construct an extension method to haplotype-based imputation. To verify our algorithm, extensive experiments have been performed, and numerical results show that our algorithm is superior to haplotype-bascd imputation method. At the same time, numerical results also prove joint mutual information can better measure dependence of SNP sites. By experiments, we also conclude that dependence between the adjoint SNP sites is not necessary strongest.
Keywords/Search Tags:Optimization Model, Optimistic Algorithm, Single Nucleotide Polymorphism, Haplotype, Genotype
PDF Full Text Request
Related items