Font Size: a A A

Statistical Integrative Omics Methods For Disease Subtype Discover

Posted on:2018-02-20Degree:Ph.DType:Dissertation
University:University of PittsburghCandidate:Huo, ZhiguangFull Text:PDF
GTID:1474390020457479Subject:Biostatistics
Abstract/Summary:
Disease phenotyping using omics data has become a popular approach that can potentially lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the first step towards this goal. With the accumulation of massive high-throughput omics data sets, omics data integration becomes essential to improve statistical power and reproducibility. In this dissertation, two directions from sparse K-means method will be extended.;The first extension is a meta-analytic framework to identify novel disease subtypes when expression profiles from multiple cohorts are available. The lasso regularization and meta-analysis can identify a unique set of gene features for subtype characterization. By adding pattern matching reward function, consistency of subtype signatures across studies can be achieved.;The second extension is using integrating multi-level omics datasets by incorporating prior biological knowledge using sparse overlapping group lasso approach. An algorithm using alternating direction method of multiplier (ADMM) will be applied for fast optimization.;For both topics, simulation and real applications in breast cancer and leukemia will show the superior clustering accuracy, feature selection and functional annotation. These methods will improved statistical power, prediction accuracy and reproducibility of disease subtype discovery analysis.;Contribution to public health: The proposed methods are able to identify disease subtypes from complex multi-level or multi-cohort omics data. Disease subtype definition is essential to deliver personalized medicine, since treating different subtypes by its most appropriate medicine will achieve the most effective treatment effect and eliminate side effect. Omics data itself can provide better definition of disease subtypes than regular pathological approaches. By multi-level or multi-cohort omics data, we are able to gain statistical power and reproducibility, and the resulting subtype definition is much reliable, convincing and reproducible than single study analysis.
Keywords/Search Tags:Omics, Disease, Subtype, Statistical, Methods, Using
Related items