Font Size: a A A

Feature selection methods for intelligent systems classifiers in healthcare

Posted on:2002-04-14Degree:Ph.DType:Dissertation
University:Loyola University of ChicagoCandidate:Cullen, Phyllis PalkaFull Text:PDF
GTID:1468390011992285Subject:Health Sciences
Abstract/Summary:
Data mining uses a variety of techniques to detect patterns related to health states and outcomes, which are not easily detected using traditional statistical methods. Feature selection is the step where clusters of potentially important variables are identified. The study examined feature selection methods for an intelligent systems classifier (ICS). An ICS is a computer system that learns. The example used for classification was binary self-reported activity status relative to others of the same age and gender. The sample was a dataset of the 20,050 adult cases of the NHANES III national health survey. The independent variables were feature selection method (filter and wrapper) and psychosocial feature inclusion status (ablated or unablated). The dependent variable was classification performance. The method was to generate four feature sets using genetic algorithms, build a neural network with each feature set, test the classification performance of each neural network, and compare the classification performance. Using error rate, a contingency matrix, and area under the receiver operating characteristic curve (AUROC), descriptively there appeared to be no difference between sets, when the classes were combined. However, sensitivity (positive class 1) was lower than specificity in all four sets. When the psychosocial features (unablated) were included in the search space along with laboratory values and physical exam, the GA wrapper (non-linear) performed better than the filter (linear). The number of features were reduced by approximately 90%. A Z statistic was done and showed that there was a statistically significant difference in the AUROC for each set when compared to random classification performance (p = .000, SE = .008, two-tailed, non-parametric, n = 5048), but no set of pairs were significantly different, with alpha at 0.05. A search of healthcare indexes returned no entries of the NHANES III used with any artificial intelligence method. Genetic algorithms were empirically shown to be useful for feature selection with the highly dimensional, mixed, diverse data produced by healthcare. A genetic algorithm was not found reported in the nursing research literature.
Keywords/Search Tags:Feature selection, Classification performance, Methods
Related items