Font Size: a A A

Small insertion-deletion polymorphisms in the human genome: Characterization and automation of detection by resequencing

Posted on:2007-05-06Degree:Ph.DType:Dissertation
University:University of WashingtonCandidate:Bhangale, TusharFull Text:PDF
GTID:1444390005474404Subject:Biology
Abstract/Summary:
Insertion-deletions (indels) and other structural rearrangements are beginning to receive considerable attention in the study of human sequence variation partly due to their role in disease susceptibility. Among these variants, small (∼1-30 base-pair) indels are the most common and constitute ∼24% of the known disease-causing mutations. Due to their abundance, they can also improve the resolution of single nucleotide polymorphism (SNP) based genetic maps and play an important role in the mapping of complex diseases, if their population genetic characteristics are more fully understood in comparison to SNPs. There has been no attempt to systemically study the properties of small indels identified in an unbiased manner. In this work, we evaluated the characteristics of indels identified comprehensively in 330 human genes. Our findings indicate that indels can be valuable in disease mapping studies since their evolutionary histories are similar to those of SNPs. Despite the importance of small indels, no high throughput techniques have been developed to help automate their detection and genotyping. While in principle these tasks can be accomplished using sequence trace data from diploid samples, existing approaches to automating these processes require extensive manual data review, making them inconvenient for large-scale studies. We have developed an algorithm that uses statistical analysis of base-calls, quality scores and peak heights to help automate the detection and genotyping of indels from sequence traces. The algorithm focuses particularly on identifying heterozygous individuals, which allows it to reliably identify low-frequency polymorphisms. It considerably outperforms existing software, providing a level of automation that was not previously available. For example, in our tests it finds 80% of all indel polymorphisms with almost no false positives, and 97% of all indels with 1 false positive per 10 true positives. Additionally, genotyping accuracy exceeds 99% and it correctly infers indel length in 96% of cases. We have implemented the method in a software package (PolyPhred version 6.0), freely available for academic use. We apply this algorithm to the largescale sequence trace data for the ENCODE regions, generated by the HapMap project, to provide the first report of indels in these regions, and identify 1126 novel polymorphisms.
Keywords/Search Tags:Indels, Polymorphisms, Human, Small, Detection, Sequence
Related items