Font Size: a A A

Prostate Cancer Classification Based on Gene Expression and Splicing Profiles

Posted on:2017-02-02Degree:M.SType:Thesis
University:University of California, Los AngelesCandidate:Meng, MengFull Text:PDF
GTID:2454390008466233Subject:Statistics
Abstract/Summary:
The purpose of this study was to propose a method for classifying prostate cells into specific diagnostic categories based on their gene expression and exon inclusion level and compare their performance in classification. In order to build a concise statistical model with meaningful biological information, we combining univariate analysis with multivariate analysis with lasso regularization for variable selection. Missing data is an important problem for exon inclusion level in our data. We apply two imputation methods and compare their results. Our questions in concern were answered by error rates of 100 iterations of cross-validation in testing after training. We found: (1) Exon inclusion level has a much stronger prediction ability than gene expression on our data by making lower error rates (p-value=1.29e-11 for exon inclusion level imputed by median and 2.20e-16 for exon inclusion level imputed by KNN); (2) The model built on exon inclusion level is more concise with less variables than that built on gene expression (p-value=8.15e-6); (3) Imputation methods on exon inclusion level does not affect classification results (p-value=5.37e-1).
Keywords/Search Tags:Gene expression, Exon inclusion level, Classification
Related items