Font Size: a A A

Statistical pattern recognition for breast cancer research: Comparison of theory-driven general linear model methodologies with data-driven artificial neural network architectures

Posted on:2005-01-28Degree:Ph.DType:Thesis
University:Fuller Theological Seminary, School of PsychologyCandidate:Parsons, Thomas DFull Text:PDF
GTID:2458390008492795Subject:Psychology
Abstract/Summary:
Medical informatic data analysis aims at mining databases for knowledge discovery using statistical pattern recognition methodologies. Medical informatic researchers use knowledge discovered from databases to make predictions and classifications for biomedical predictive models. The primary goal of a predictive model is to sufficiently cover the domain space of a given problem domain in order to make robust predictions. This dissertation concerns itself with the difficulties of bringing to fruition such models, especially synthesis of linear methodologies found in the general linear model with nonlinear methodologies found in the artificial neural network model. Problems in attempts at pattern-recognition in situations where input data is fuzzy or the optimal algorithm is difficult to ascertain have plagued attempts at pattern recognition and knowledge discovery in medical informatic databases. Data driven Artificial Neural Networks (ANN) and theory driven General Linear model (GLM) strategies may be combined to develop a pragmatic approach, in which previously intractable medical informatic problems may be solved. The pragmatic model makes judicious use of both theory driven (linear and nonlinear) and data driven (linear and nonlinear) methodologies. The research reported showed that whereas the GLM outperformed the ANN (systematically selected, multiple-layered, and fully connected network topology of 4-18-1) when using a theory driven estrogen exposure index (TDEEI) to predict breast cancer, the ANN better modeled the age occurrence relation. Contrariwise, when using the data driven estrogen exposure index (DDEEI), the ANN (systematically selected, multiple-layered, and fully connected network topology of 10-1-1) outperformed the GLM on both prediction of breast cancer, and modeling the age occurrence relation. Results suggest that a pragmatic model, which makes judicious use of both theory driven (linear and nonlinear) and data driven (linear and nonlinear) methodologies, is best.
Keywords/Search Tags:Methodologies, Data, Linear, Driven, Pattern recognition, Theory, Artificial neural, Breast cancer
Related items