Font Size: a A A

Applying machine learning to biomedical data: The small-sample and interpretability dilemmas

Posted on:2006-09-03Degree:Ph.DType:Thesis
University:The Johns Hopkins UniversityCandidate:D'Avignon-Aubut, ChristianFull Text:PDF
GTID:2458390008976341Subject:Engineering
Abstract/Summary:
Whereas crucial biomedical data are being assembled at an increasingly rapid pace, the actual quantity of data given the complexity of the underlying problems---even in the best situations---inescapably remains vanishingly small for standard statistical learning techniques. Simplifications are acutely needed to enable learning and inferences to forestall some major pitfalls; apparent performance at various tasks would otherwise surely be misleading and overly optimistic. Possible simplifying principles are herein studied, and suitable ones are exploited to devise simple and biologically interpretable classifiers used to tackle some interesting and challenging problems, e.g., the classification of gene expression profiles. The normality hypothesis, a major distributional assumption regarding gene expression data, is first examined. Another possible simplification is then studied: Replacing the expressions by mere ranks. Simple classifiers based upon this are shown to be both transparent and very efficacious for several problems such as the detection of dilated cardiomyopathy or the identification of particular tumor types. It is also shown how ranks may prove useful not only for classification, but for modeling as well. Continuing with simplicity-driven biomedical classifiers, trees built upon simplified "spike train" data further demonstrate how such interpretable classifiers may naturally lead to coherent inferences, as illustrated for peripheral auditory system amplitude and frequency encoding.
Keywords/Search Tags:Data, Biomedical, Classifiers
Related items