Font Size: a A A

Detecting structural variants in the human genome using high throughput technology

Posted on:2014-09-05Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Halper-Stromberg, EitanFull Text:PDF
GTID:1454390005999225Subject:Genetics
Abstract/Summary:
Background Structural variants (SVs) and copy number variants (CNVs) play an important role in human phenotypic variation. Germline SVs and CNVs are important in disease susceptibility for both multigenic and monogenic disorders. Somatic SVs and CNVs may be either crucial, as in the case of antibody generation, or lethal, as in the case of cancer causing mutations.;Methods I analyzed data from both microarray and next-generation sequencing experiments in an effort to determine the CNV and SV detection capacity of each. Analysis of three experiments dominated these efforts. The first was a spike-In experiment in which six microarray platforms were assessed for their ability to measure CNVs of known quantity and at known locations in the genome. The second was a microarray experiment with 135 technical replicate pairs, in which CNV detection was assessed using the concordance between replicate pairs. The third was a target capture sequencing experiment in which I assessed the ability of paired-end, short reads to reconstruct SV events within highly repetitive regions of the genome.;Results and Conclusions In the microarray datasets we found that CGH arrays did relatively well in terms of CNV dosage estimation while SNP arrays did relatively well in terms of detection. We developed a pipeline for CNV detection that outperformed the pipelines offered by the manufacturers of each platform. Importantly, this pipeline mitigated waves caused by fluctuations in genomic GC content, using loess smoothing. To address the wave problem further, we developed and tested a new algorithm for microarray GC correction, based upon an algorithm used for sequencing data. Our method optimized the window size for each dataset within which GC content around probes was quantified. Our method outperformed the current standard, which used one window size for all datasets. In our study of short-read sequencing data we found repetitive sequences to be the main challenge for SV and CNV detection. We created a new method to rank SV and CNV candidates, taking into account improper alignments within repetitive sequence. Our method outperformed 3 other methods designed to handle short-reads from repetitive loci.
Keywords/Search Tags:CNV, Variants, Using, Genome, Repetitive, Cnvs, Method
Related items