Font Size: a A A

Learning with Kernels and Graphs to Understand Cancer DNA Copy Number Variations

Posted on:2013-02-13Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Tian, ZeFull Text:PDF
GTID:2454390008477261Subject:Computer Science
Abstract/Summary:
DNA copy number variations (CNVs) are biological indicators that characterize cancer genomes. Predicting the prognosis of cancer from CNVs and identifying cancer-causing CNVs is a challenging problem due to the high dimensionality of the CNV features and the heterogeneity of patients. In this thesis, our objective is to build robust predictive models based on CNV data using machine learning techniques for accurate cancer diagnosis and prognosis, as well as for the identification of cancer-causing CNVs.;We proposed several machine learning models towards these objectives: 1. We developed a hypergraph-based semi-supervised learning algorithm HyperPrior for cancer outcome prediction from CNV data and gene expression data. It incorporates biological prior knowledge such as the spacial information in arrayCGH datasets to get consistent weighting on correlated genomic features, thus to improve the accuracy of the model in sample classification. In addition, the algorithm can also be used for biomarker or cancer-causing CNV detection; 2. We developed an alignment-based kernel method for integrating CNV data from multiple platforms. By integrating datasets generated from different probe sets, the new kernel could improve the cancer outcome prediction by the SVM classifier. Furthermore, we also designed a multiple alignment approach based on our alignment kernel to identify shared CNVs among cancer samples, which served as candidates of cancer-causing CNVs for further analysis; 3. We proposed an algorithm to learn a low-rank graph to represent the similarities between data points. This low-rank graph could capture the global cluster structures and improve the performance of label propagation. The whole approach can be applied to arrayCGH datasets as well as other types of datasets for better sample classification results; 4. We proposed a latent feature model that couples sparse sample group selection with fused lasso. Clinical information was used to define the group structure on patient samples. By sparse group selection, the model was able to identify group-specific CNVs instead of common CNVs from arrayCGH datasets.;We used both simulations and several publicly available genomic datasets to evaluate our models. The results suggest that these models are promising in achieving better cancer prognosis prediction and identification of cancer-causing CNVs.
Keywords/Search Tags:Cancer, CNV, Cnvs, Prognosis, Kernel, Models
Related items