Font Size: a A A

Research On Enumeration Algorithms For The Haplotype Assembly Problem

Posted on:2015-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2268330431467991Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Haplotyping plays an important role in locating complex disease susceptibility genes. The haplotype assembly problem is a computational problem that, given a set of DNA sequence fragment data of an individual, induces the corresponding haplotypes. For the problem, based on different optimal criteria, there are many different computational models, such as Minimum SNP Removal (MSR), Minimum Fragment Removal (MFR), Minimum Error Correction (MEC), etc. They aim to reconstruct a pair of haplotypes that is optimal. One pair of optimal haplotypes is not enough for the biologist to analyse the complex biological problems. This paper proposes enumerative algorithms for MSR, MFR and MEC.Given a SNP matrix, MSR model is try to remove the minimum number of rows so that we can get the haplotype of individual. We define K_MSR (k-Minimum SNP Removal Enumeration):given a SNP matrix, a small positive integer k, enumerate at most k solutions for MSR model. We designed an algorithm for K_MSR, whose time complexity and space complexity are O(nk1k2+nkki+mlogm+mk1) and O(nkk1) respectively, where m is the number of fragments, n is the number of SNPs, k1is the maximum number of SNPs that a fragment covers, and k2is the maximum number of the fragments covering a SNP site.Given a SNP matrix, MFR model is try to remove the minimum number of fragments so that we can get the haplotype of individual. We define K_MFR (k-Minimum Fragment Removal Enumeration) as:given a SNP matrix, a small positive integer k, enumerate at most k solutions for MFR model. We designed an algorithm for K_MFR, whose time complexity and space complexity are O(mkk22+mkk1k2+mlogm+nk2) and O(mk1+mkk22) respectively. Based on MEC, we defined K_MEC (k-Minimum Error Correction Enumeration) as:given a SNP matrix, a small positive integer k, enumerate at most k solutions for MEC model. We designed an algorithm for K_MEC, whose time complexity and space complexity are O(nk22k2+mlogm+mk1) and O(mkk12k1+nk2) respectively.Extensive experiments show that these algorithms can effectively provide multiple optimal solutions to biologists for further analyses, and it’s useful for them to analyze complex biological problems.
Keywords/Search Tags:haplotype assembly problem, multiple optimal solutions, parameterized algorithm, dynamic programming, bioinformatics
PDF Full Text Request
Related items