Font Size: a A A

Optimized decision fusion of heterogeneous data for breast cancer diagnosis

Posted on:2008-08-17Degree:Ph.DType:Dissertation
University:Duke UniversityCandidate:Jesneck, Jonathan LeeFull Text:PDF
GTID:1444390005969824Subject:Engineering
Abstract/Summary:
As more diagnostic testing options become available to physicians, it becomes more difficult to combine various types of medical information together in order to optimize the overall diagnosis. To improve diagnostic performance, here we introduce an approach to optimize a decision-fusion technique to combine heterogeneous information, such as from different modalities, feature categories, or institutions. This dissertation presents a computer aid known as optimized decision fusion, and explores both its underlying theory and practical application.; The purpose of this work was (1) to present optimized decision fusion, a classification algorithm designed for noisy, heterogeneous data sets with few samples, and (2) to evaluate decision fusion's classification ability on clinical, heterogeneous breast cancer data sets. This study used the following three clinical data sets: heterogeneous breast mass lesions, heterogeneous breast microcalcification lesions, and breast blood serum protein levels. In addition to these clinical data sets, we also used various simulated data sets.; We used two variants of our decision fusion algorithm: (1) DF-A, which optimized the area (AUC) under the receiver operating characteristic (ROC) curve, and (2) DF-P, which optimized the high-sensitivity partial area (pAUC) under the curve. We compared decision fusion's classification performance to those of the following other classifiers: linear discriminant analysis, an artificial neural network, classical regression models (linear, logistic, and probit), Bayesian model averaging of these regression models, least angle regression, and a support vector machine.; The simulation studies showed that decision fusion is able to maintain high classification performance on data sets with many weak features and few samples, although performance was lowered by feature correlations. For the calcification data set, DF-A outperformed the other classifiers in terms of AUC (p < 0.02) and achieved AUC = 0.85 +/- 0.01. The DF-P surpassed the other classifiers in terms of pAUC (p < 0.01) and reached pAUC = 0.38 +/- 0.02. For the mass data set, DF-A outperformed both the ANN and the LDA (p < 0.04) and achieved AUC = 0.94 +/- 0.01. Although for this data set there were no statistically significant differences among the classifiers' pAUC values (pAUC = 0.57 +/- 0.07 to 0.67 +/- 0.05, p > 0.10), the DF-P did significantly improve specificity versus the LDA at both 98% and 100% sensitivity (p < 0.04).; For the data set of blood serum proteins, there were no statistically significant differences among the classifiers for distinguishing normal tissue from malignant lesions (AUC = 0.79 to 0.84, p > 0.12), but decision fusion was able to achieve significantly higher specificity, 60%, at 90% sensitivity (p < 0.02). For the task of distinguishing benign from malignant lesions, all classifiers had very poor performance (AUC = 0.50 to 0.57), but decision fusion achieved the best performance at AUC = 0.64 (p < 0.05). The proteins were probably indicative of secondary effects, such as inflammatory response, rather than specific for cancer.; In conclusion, decision fusion directly optimized clinically significant performance measures such as AUC and pAUC, and sometimes outperformed other machine-learning techniques when applied to three different breast cancer data sets. By testing on a wide variety of simulated and clinical data sets, we show that decision fusion is robust to noisy data and can handle heterogeneous data structures when given relatively few observations.
Keywords/Search Tags:Decision fusion, Data, Breast cancer, AUC
Related items