Font Size: a A A

Support vector machines in data mining

Posted on:2002-02-27Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Lee, Yuh-JyeFull Text:PDF
GTID:1468390011498209Subject:Computer Science
Abstract/Summary:
Our smooth support vector machine (SSVM) algorithm generates an SVM classifier by solving a strongly convex unconstrained minimization problem without using any optimization package. We use a Newton-Armijo algorithm that can be shown to converge globally and quadratically to the unique solution of the SSVM. Numerical results are given to demonstrate the effectiveness and speed of SSVM.; We present a novel rectangular kernel idea for nonlinear SSVM that can be applied to all nonlinear kernel algorithms to avoid the difficulty of dealing with a huge dense kernel matrix. This leads to our reduced support vector machine (RSVM) algorithm. The RSVM algorithm uses a random subset of the dataset to generate a nonlinear classifier. Instead of solving an unconstrained minimization problem in m + 1 variables, m is the number of data points, RSVM solves an unconstrained minimization problem in m + 1 variables where m is the size of the reduced set and in m << m. This reduces computational complexity from O((m + 1)3) to O(( m + 1)3). In addition, we only need to keep the reduced set to characterize the nonlinear classifier for classifying new unseen data. Numerical results show that RSVM is much faster than other methods and has similar or better test set correctness.; An application of data mining is the identification of breast cancer patients for whom chemotherapy could prolong longevity. We cluster 253 breast cancer patients into three prognostic groups: Good, Poor and Intermediate. Each of the three groups has a significantly distinct survival curve. Of particular significance is the Intermediate group, because patients with chemotherapy in this group do better than those without chemotherapy in the same group. This is the reverse case to that of the overall population of 253 patients for which patients undergoing chemotherapy have worse survival than those who do not. We also prescribe a procedure for classifying breast cancer patients into the three above prognostic groups. These results suggest that patients in the Good group should not receive chemotherapy while Intermediate group patients should receive chemotherapy based on our survival curve analysis.
Keywords/Search Tags:Supportvector, Unconstrainedminimizationproblem, SSVM, Chemotherapy, Data, Breastcancerpatients, RSVM
Related items