Font Size: a A A

Machine learning methods for decision support in health policy for developing countries

Posted on:2010-04-18Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:Green, Sean TFull Text:PDF
GTID:1449390002976797Subject:Engineering
Abstract/Summary:
Diarrheal illness is a major cause of death worldwide that disproportionately affects people in developing countries. Studies have attempted to determine the factors that contribute diarrheal illness in a country but the results of many studies have limited applicability beyond the countries present in the study for which data are available. Much of the data that exist are of poor quality---missing values abound and uncertainty estimates regarding the data are absent. In this dissertation I explore the use of machine learning techniques for working with missing data and demonstrate their use on a data set used to predict diarrheal illness in 192 countries worldwide. Three techniques for data imputation are compared: the bootstrap method, Classification and Regression Trees imputation, and Expectation-Maximization algorithm imputation; and the imputed results are used to build a regression tree model that predicts diarrheal worldwide based on country-level indicators. This dissertation also explores the use of variable importance measures for random forests to identify the most important variables for predicting diarrheal illness levels in a country and presents a new method for calculating variable importance that provides information that existing variable importance measures lack.
Keywords/Search Tags:Countries, Diarrheal illness, Variable importance
Related items