Font Size: a A A

Data mining and domain knowledge: An exploration of methods to advance medical research

Posted on:2014-08-14Degree:Ph.DType:Thesis
University:University of Maryland, Baltimore CountyCandidate:Engle, Kelley MFull Text:PDF
GTID:2458390005484743Subject:Computer Science
Abstract/Summary:
Researchers in the medical domain consider the double-blind placebo controlled clinical trial the gold standard. The data for these clinical trials are collected for a specifically defined hypothesis and there is very little in the realm of secondary data analyses conducted. The underlying purpose of this work is to demonstrate the value and relevance of data mining and artificial intelligence methods for both pre-processing needs and secondary data analyses in medical research. The selected medical domain for this demonstration is autism and in particular the data from IAN (Interactive Autism Network) obtained from Kennedy Krieger.;During the process of predictive model building, numerous research issues were addressed at different phases. Solutions were provided for: (1) Statistical issues with metric-based data mining methods and (2) Provide guidelines for how to incorporate domain knowledge in data mining.;Various statistical methods used in data mining, such as Naive Bayes, require metric data to ensure reliable and robust results. Many public data health sources, including the IAN dataset, primarily consist of non-metric data in the form of Likert scales and categorical data. MDS (Multi-Dimensional Scaling) will be presented as method which can effectively transform non-metric data to metric.;For incorporating domain knowledge in data mining, the initial work of integrating autism domain knowledge in multi-level association rule mining is presented. Through the use of an external treatment ontology, more interesting association rules were extracted for autism treatments. In order to further explore the role of knowledge guidance, the hypothesis indicated that knowledge-guided mutation applied to classification rules will affect the search trajectory incrementally. This hypothesis builds on the underlying premise of gradualness. A pilot and full-fledged experiments were conducted where knowledge from the autism domain in the form of a drug taxonomy and autism comorbidity semantic net guided the mutation of classification rules. The experiments for the drug taxonomy confirmed the hypothesis that domain knowledge can be utilized to constrain the search space. This research is both novel and significant as it provides a practical resource for health informatics researchers who want to incorporate domain knowledge into data mining and artificial intelligence models.
Keywords/Search Tags:Data, Domain, Medical, Methods
Related items