Font Size: a A A

Computational assessment of somatic missense mutations detected in tumor sequencing studies with cancer-specific high-throughput annotation of somatic mutations (CHASM)

Posted on:2013-06-28Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Carter, HannahFull Text:PDF
GTID:1454390008464048Subject:Engineering
Abstract/Summary:
Missense mutations are a key mechanism by which important cellular behaviors, such as cell growth, proliferation, and survival, are disrupted in cancer. However, only a fraction of the missense mutations observed in tumor genomes are expected to be cancer causing. Distinguishing tumorigenic "driver" mutations from their neutral "passenger" counterparts is currently a pressing problem in cancer research.;Advances in DNA sequencing technologies in the last decade have enabled exhaustive cataloging of somatic mutations in whole tumor genomes. Missense mutations are detected at high frequency in tumor sequencing studies, often numbering in the hundreds to thousands. A small number of these mutations occur at high frequency and are almost certainly drivers, but the vast majority occur at low frequency and are of ambiguous relevance to cancer. Experimentally verifying each of these mutations is impractical as current methods often require years of labor.;To address this issue, I have developed CHASM, a high-throughput method based on the supervised machine learning algorithm, Random Forest. CHASM seeks to discriminate driver and passenger missense mutations with high specificity by using a unique training set, composed of driver mutations curated from the COSMIC database and synthetic passengers simulated to represent random mutations likely to arise in tumors. CHASM demonstrates high coverage, and performs well compared to similar methods in benchmark tests and hold out validation experiments.;I have applied CHASM to missense mutations detected in 15 tumor sequencing studies of 12 different tumor types. In each application, CHASM recognizes known driver mutations, even when they are withheld from its training set, and implicates new mutations as putative drivers. Pathway analysis and functional annotation of these genes indicates that many of them participate in processes that are altered in tumorigenesis. Comparison to methods used in routine analysis of somatic missense mutations indicates that CHASM may provide a useful and non-redundant tool for identifying candidate driver mutations in tumor sequencing studies.
Keywords/Search Tags:Mutations, CHASM, Tumor sequencing studies, Somatic, Cancer, Detected
Related items