Mining Large-scale Sequencing Reads to Learn Mutational Processe

Posted on:2018-01-02

Degree:Ph.D

Type:Dissertation

University:Yale University

Candidate:Li, Shantao

Full Text:PDF

GTID:1444390005953741

Subject:Bioinformatics

Abstract/Summary:

Mutations are alterations in the DNA. They can have critical and permanent functional and evolutionary consequences. Learning the fundamental underlying mutational processes and mechanisms is the cornerstone of genomics research. With modern high-throughput sequencing, researchers have access to unprecedentedly abundant DNA mutation data. This dissertation work provides computational methods and analysis results on large-scale sequencing reads to unveil mutational processes details in human DNA. Specifically, I focus on 1) single nucleotide variants in human cancer, 2) deletion breakpoints and 3) retroduplications in human germline. In cancer, I develop a LASSO based method to identify active mutational processes in tumor samples. It gives sparse, biologically interpretable solution and is able to leverage on prior knowledge learned from pan-cancer analysis. Furthermore, I propose a generative model to integrate mutational heterogeneity in both nucleotide contexts and genomic locations. By exploiting mutational processes fingerprints in both aspects, this framework is potentially capable to better identify mutational processes and help reveal underlying biology knowledge. Using papillary renal cell carcinoma (pRCC) and data from Pan-cancer Analysis of Whole Genomes (PCAWG) as case studies, I showcase the power of these methods in cancer genomics. In human germline, I jointly analyze the 1000 Genomes Project data with other genomic annotations. I demonstrate how strong selection and mutational mechanisms together shape deletion distribution in human genomes. In addition, I develop a method specifically targeting retroduplications in human genomes. Using this method, I obtain the largest human retroduplication variation set from 26 populations. These retroduplications reveal population structure and give hints on human recent evolution and divergence. Further insertion point analysis shows how selection and mutational processes drive the nonrandom distribution of retroduplication in the genome. Finally, to address biological data explosion, I optimize the algorithm for a Monte Carlo simulation method in protein surface sampling. The new algorithm lowers down the computational complexity to O(n 2) and thus essentially permits the sampling method to be applied on real world large proteins and complexes.

Keywords/Search Tags:

Mutational, DNA, Method, Sequencing, Human

Related items

1	Mutational Spectrum And Prognostic Stratification Of AML Based On Next Generation Sequencing
2	The Features Of ROS1 Positive NSCLC And Mutational Profiles With Crizotinib Resistance
3	Characterization Of Copy Number Variations At The Single-base-pair Level Through WGS And The Mutational Mechanisms Revealed
4	Mutational Spectrum Of Promoter And Its Correlation With Tumor Mutational Burden In Lung Squamous Cell Carcinoma Concomitant TP53 Mutations With Response To Crizotinib Treatment In Patients With ALK-rearranged Non-small Cell Lung Cancer
5	The Study On Gene Mutations Of Esophageal Squamous Cell Carcinoma Based On Next-generation Sequencing
6	The Mutational Characteristics Of Clear Cell Hepatocellular Carcinoma
7	Mutational Spectrum And Prognosis Analysis Of AML Patients Based On High-throughput Sequencing
8	Mutational Analysis Of L1 Protein And Development Of Virus-like Particles Vaccine Of Human Papillomavirus Type 11
9	Clonality Analysis And Mutational Status Of IGVH Gene In Richter's Syndrome
10	Mutational Profiling Of A Long-term Surviving Stage ? Colorectal Cancer Patient Using High Throughput Next-Generation Sequencing