Font Size: a A A

Biomarker discovery and clinical outcome prediction using knowledge-based bioinformatics

Posted on:2010-05-23Degree:Ph.DType:Dissertation
University:Georgia Institute of TechnologyCandidate:Phan, John HFull Text:PDF
GTID:1448390002486175Subject:Engineering
Abstract/Summary:
Advances in high-throughput genomic and proteomic technology have led to a growing interest in cancer biomarkers. These biomarkers can potentially improve the accuracy of cancer subtype prediction and subsequently, the success of therapy. However, identification of statistically and biologically relevant biomarkers from high-throughput data can be unreliable due to the nature of the data---e.g., high technical variability, small sample size, and high dimension size. Due to the lack of available training samples, data-driven machine learning methods are often insufficient without the support of knowledge-based algorithms. We research and investigate the benefits of using knowledge-based algorithms to solve clinical prediction problems. Because we are interested in identifying biomarkers that are also feasible in clinical prediction models, we focus on two analytical components: feature selection and predictive model selection. In addition to data variance, we must also consider the variance of analytical methods. There are many existing feature selection algorithms, each of which may produce different results. Moreover, it is not trivial to identify model parameters that maximize the sensitivity and specificity of clinical prediction. Thus, we introduce a method that uses independently validated biological knowledge to reduce the space of relevant feature selection algorithms and to improve the reliability of clinical predictors.;Biologically relevant feature selection algorithms are those that favor independently validated biomarkers. We show that guiding feature ranking algorithm and parameter selection using these biomarkers improves the efficiency of detecting new biomarkers that are also likely to validate. Furthermore, the algorithm selection process iteratively evolves as it learns and incorporates new biomarkers into the knowledge set. Using both maximum likelihood and maximum a posteriori approaches, we show that the choice of an optimal or biologically relevant method changes in the presence of knowledge feedback. The clinical utility of biomarkers depends on their feasibility in clinical prediction applications. Thus, in a similar approach as---and in collaboration with---the FDA Microarray Quality Control (MAQC) Consortium, we examine several microarray datasets to assess the effect of knowledge-guided feature selection on prediction accuracy. The microarray datasets in our study vary in sample size and clinical focus. For each clinical focus---renal cancer, prostate cancer, and breast cancer---we build and test classification models using independent training and testing datasets in order to reduce prediction bias. Results of these experiments indicate that knowledge-guided feature selection improves clinical prediction. Finally, one of the primary obstacles in translating research to clinical applications is the inaccessibility of bioinformatics applications to the general community of clinicians and biologists. Therefore, we implement several functions of the knowledge-based framework as a web-based and user-friendly application called omniBiomarker. We develop functions of omniBiomarker according to standards of the NCI Cancer BioInformatics Grid (caBIG), further increasing the overall impact of this work.
Keywords/Search Tags:Prediction, Cancer, Biomarkers, Using, Feature selection, Knowledge-based
Related items