Font Size: a A A

Predicting Inpatient Length of Stay in Western New York Health Service Area Using Machine Learning Algorithm

Posted on:2018-07-07Degree:M.SType:Thesis
University:State University of New York at BinghamtonCandidate:Salah, HayaFull Text:PDF
GTID:2474390020457474Subject:Industrial Engineering
Abstract/Summary:
The main purpose of this thesis is to compare and analyze different classification models to predict the length of stay (LOS) for the population of Western NY health service area. Twelve classification models were used in this thesis including individual classifiers, ensemble methods, and deep learning. Based on the literature review conducted in this thesis, this is the first attempt to examine the performance of deep learning methods in LOS prediction. The data used for this research has been obtained from health.data.ny.gov website. This data contains basic record level details regarding the discharge of inpatients in the State of New York in different health service areas in 2012. This data contains information such as age, gender, race, health service area, facility ID, diagnosis, patient disposition, length of stay, payment methods, etc. In this thesis, the records of inpatient in the Western NY health service area were just considered. The methodology implemented in this thesis consists of three major parts: data preprocessing, training the prediction models and evaluating the performance of the classification models. In data preprocessing, four steps were performed: (1) treating the missing values, (2) binning the class (LOS) into three classes i.e. low, medium and high to apply classification models into the data, (3) conducting a correlation test to eliminate the redundant features and (4) performing feature selection to identify the most significant features which are related to LOS using two filters techniques, namely Chi-square (chi2) and Mutual Information (MI).;The last step in data pre-processing was the transformation of categorical variables into dummy variables which was performed in two steps. First, SPSS was used to transform the categorical values into ordinal numbers based on the levels of each variable and second, the "OneHotEncoder" in Python was used to transfer the ordinal numbers into dummy variables. After the data pre-processing step was conducted, the data was divided into two sets: training set (70%) and testing set (30%). The models were trained on the training dataset and tested on the tested data set.;Based on the experimental results, using the feature selected by chi 2-test results in a higher training performance compared to features selected by MI. The performance of these models was compared based on the confusion matrix, accuracy, precision, recall, and F1-score. The deep learning method achieved the highest prediction accuracy, precision, recall, and F1-score of 88.5%, 88%, 89%, and 88% respectively on the testing data set compared to the other classification models. Based on these results, it can be concluded that deep learning models have a good potential in predicting patient's LOS.
Keywords/Search Tags:Health service area, Models, LOS, Deep learning, Length, Data, Thesis, Western
Related items