Font Size: a A A

Machine Learning Methods with Hierarchical Dat

Posted on:2018-02-25Degree:M.SType:Thesis
University:University of Colorado Denver, Anschutz Medical CampusCandidate:Roberts, Katherine MFull Text:PDF
GTID:2478390020457476Subject:Biostatistics
Abstract/Summary:
General Linear Mixed Modeling (GLMM) has been an established method for classifying and predicting disease outcome in the field of Radiology. This paper provides a comparison of several machine learning methods to analyze hierarchically structured unbalanced dichotomous outcome data. The goal is to determine if the hierarchical structure of the described data makes a difference when choosing one of these methods.;The methods assessed with GLMM were two-way Naive Bayes (NB), Penalized Linear Discriminant Analysis (PDA), and Random Forests (RF). While all methods evaluated the dataset naively (i.e. not taking hierarchy into account), this paper shows an expansion of PDA and RF to include first-level data in the hierarchical structure. Cross-validation methods include 60/40 validation sets (training and testing data partitions) as well as leave-one-out cross-validation (LOOCV).;Data was simulated to investigate the adequacy of these techniques when different correlation (between-subject variance) and sample size parameters are considered. ROC curves with AUC (95% CI), Youden indexes, sensitivities and specificities as well as prediction accuracies were evaluated.;We show no prediction accuracy gain over GLMM for our particular dataset and, while sensitivities and specificities differ across methods, further evaluation on more robust data and additional work to improve and expand the machine learning functions presented here is desirable.
Keywords/Search Tags:Machine learning, Methods, Data, GLMM, Hierarchical
Related items