Font Size: a A A

Intensive Detection Of Genomic Variants

Posted on:2014-02-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q GongFull Text:PDF
GTID:1220330467480035Subject:Genomics
Abstract/Summary:PDF Full Text Request
Variants are basic research subjects in genetics, and genomic variants are the source generating other types of variants. Therefore, techniques detecting genomic variants in an accurate and efficient way are the foundation of genetic researches, and also facilitate researches in many other fields. With the completely sequencing of the human genome and the application of the next-generation sequencing technologies, techniques detecting genomic variants have entered into a large-scale, high-throughput era. Following the basic steps of library construction, sequencing, read mapping and variant calling, numerous variants throughout the whole genome could be acquired. Meanwhile, a number of techniques have been developed to detect different kinds of genomic variants. However, there is lack of a golden standard to determine the structural variants and rare variants that exist in large in genome of multicellular organisms. The dissertation introduces the studies on techniques that improve the detection of genomic structural variants and rare variants by customizing the library construction protocols and developing related computational pipelines.Ditag technique detects medium-sized deletions from low-coverage sequences. We separately constructed two mate-paired libraries from restriction fragments digested from a liver cancer genome. A total of3Gb data (about1Ă—human genome size) of paired mappable reads (ditags) was generated by SOLiD, and175medium-sized deletions were inferred by identifying the ditags with disorder alignments to the reference genome sequence. Sanger sequencing confirmed an overall detection accuracy of95%. Good reproducibility was verified by the deletions that were detected by both libraries. We implemented our deletion detecting pipeline to paired-end RRL data of four domesticated chicken lines, and detected more than six thousand of deletions, showing a much higher power than the traditional paired reads analysis.Pseudo-Sanger technique fills the gaps between the paired-end reads to form longer sequences with500-600bp in size.Step-sized libraries were built from genomic DNA of Drosophila melanogaster w"18, with the insert size ranging from100to600bp.5.69million of pseudo-Sanger sequences were assembled from a total of63.64million short reads using a customized software AnyTag.876structural variants were detected by aligning the pseudo-Sanger sequences to the reference genome, including723deletions,122insertions/rearrangements and31inversions. An overall accuracy of85.7%(54/63) was confirmed by experimental validation, demonstrating that Pseudo-Sanger technique can accurately identify the breakpoints of structural variations. Read family analysis is a technique that detect rare variants in micro-amount of tissue DNA.100cells were micro-dissected from cirrhosis liver tissue, and the genomic DNA was fragmented, whole-genome amplified and paired-end sequenced. As the number of DNA templates was much smaller than the number of possible combinations of different fragment ends, the paired reads could be clustered into different read families. In this way, the contamination from sequencing errors could be removed by analyzing the internal sequence consistency of each family with at least5read members. We got a total of212Mb family sequences that represented single molecular, from which93rare somatic variants were detected.How to detect the structural variants and rare variants are among the most challenging topics in genomics. Based on experimental and computational innovation, the techniques presented by the dissertation detect an amount of structural variants or rare variants from genomic DNA in human cancerous cells, hepatocytes and fruit flies, and show advantages in cost-efficiency and accuracy. These techniques have broad application potential in genetics, biology of development and cancer, ecology and translational medicine.
Keywords/Search Tags:Next-generation sequencing, genomic variant, structural variant, rare variant, restriction enzyme
PDF Full Text Request
Related items