Predicting high-cost patients in general population using data mining techniques

Posted on:2013-05-31

Degree:M.Sc

Type:Thesis

University:University of Ottawa (Canada)

Candidate:Izad Shenas, Seyed Abdolmotalleb

Full Text:PDF

GTID:2458390008984851

Subject:Health Sciences

Abstract/Summary:

In this research, we apply data mining techniques to a nationally-representative expenditure data from the US to predict very high-cost patients in the top 5 cost percentiles, among the general population. Samples are derived from the Medical Expenditure Panel Survey's Household Component data for 2006-2008 including 98,175 records. After pre-processing, partitioning and balancing the data, the final MEPS dataset with 31,704 records is modeled by Decision Trees (including C5.0 and CHAID), Neural Networks. Multiple predictive models are built and their performances are analyzed using various measures including correctness accuracy, G-mean, and Area under ROC Curve. We conclude that the CHAID tree returns the best G-mean and AUC measures for top performing predictive models ranging from 76% to 85%, and 0.812 to 0.942 units, respectively. Among a primary set of 66 attributes, the best predictors to estimate the top 5% high-cost population include individual's overall health perception, history of blood cholesterol check, history of physical/sensory/mental limitations, age, and history of colonic prevention measures. It is worthy to note that we do not consider number of visits to care providers as a predictor since it has a high correlation with the expenditure, and does not offer a new insight to the data (i.e. it is a trivial predictor). We predict high-cost patients without knowing how many times the patient was visited by doctors or hospitalized. Consequently, the results from this study can be used by policy makers, health planners, and insurers to plan and improve delivery of health services.

Keywords/Search Tags:

Data, High-cost patients, Population

Related items

1	The Construction Of Population Management Information System Based On B/S Structures
2	The Data Mining And Data Analysis In The Expense Of Insured Patients For Medical Treatment
3	High resolution satellite images and LiDAR data for small-area building extraction and population estimation
4	Design And Implementation Of Population Information Management System For The Public Security Department
5	Design And Implementation Of Information Management System Of High-risk Migrant Population For Weifang
6	Modeling And Forecasting The Age Structure Of Population
7	Application Of Data Mining Technology In The Inner Mongolia Autonomous Region Population Data
8	Study On Quality Cost Management System In Modern Manufacturing Enterprises
9	Cost-effective programming for maximum power-efficiency of data centric applications on FPGAs
10	Research And Design Of Management System For City’s Floating Population