Font Size: a A A

A comparison of genetic algorithms and other machine learning systems on a complex classification task from common disease research

Posted on:1996-10-06Degree:Ph.DType:Thesis
University:University of MichiganCandidate:Congdon, Clare BatesFull Text:PDF
GTID:2468390014988009Subject:Computer Science
Abstract/Summary:
The thesis project is an investigation of some well-known machine learning systems and evaluates their utility when applied to a classification task from the field of human genetics. This common-disease research task, an inquiry into genetic and biochemical factors and their association with a family history of coronary artery disease (CAD), is more complex than many pursued in machine learning research, due to interactions and the inherent noise in the dataset. The task also differs from most pursued in machine learning research because there is a desire to explain the dataset with a small number of rules, even at the expense of accuracy, so that they will be more accessible to medical researchers who are unaccustomed to dealing with disjunctive explanations of data. Furthermore, there is asymmetry in the task in that good explanations of the positive examples is of more importance than good explanations of the negative examples.;The primary machine learning approach investigated in this research is genetic algorithms (GA's); decision trees, Autoclass, and Cobweb are also included. The GA performed the best in terms of descriptive ability with the common-disease research task, although decision trees also demonstrated certain strengths. Autoclass and Cobweb were recognized from the onset as being inappropriate for the needs of common-disease researchers (because both systems are unsupervised learners that create probabilistic structures), but were included for their interest in the machine learning community; these systems did not perform as well as GA's and decision trees in terms of their ability to describe the data. In terms of predictive accuracy, all systems performed poorly, and the differences between any two of the three best systems is not significant. When positive and negative examples are considered separately, the GA does significantly better than the other systems in predicting positive examples and significantly worse in predicting negative examples.;The thesis illustrates that the investigation of "real" problems from researchers in other fields can lead machine learning researchers to challenge their systems in ways they may not otherwise have considered, and may lead these researchers to a symbiotic relationship that benefits multiple research communities.
Keywords/Search Tags:Machine learning, Systems, Task, Researchers, Genetic
Related items