Scaling up support vector machines

Posted on:2008-08-09

Degree:Ph.D

Type:Thesis

University:Hong Kong University of Science and Technology (Hong Kong)

Candidate:Tsang, Wai-Hung

Full Text:PDF

GTID:2448390005477070

Subject:Computer Science

Abstract/Summary:

Kernel methods, such as support vector machines (SVMs), have been successfully used in various aspects of machine learning problems, such as classification, regression, and ranking. Many of them are formulated as quadratic programming (QP) problems, which take O(m3) time and O(m2) space complexities, where m is the training set size. It is thus computationally infeasible on massive data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, I scale up kernel methods by exploiting such "approximateness" in this thesis.; First, I show that SVM classification problems can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, I obtain provably approximately optimal solutions with the idea of core-sets. My proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m for a fixed approximation factor (1 + epsilon) 2. Experiments on large real-world data sets demonstrate that the CVM is as accurate as existing SVM implementations but is much faster and can handle much larger data sets than existing scale-up methods.; By generalizing the underlying MEB problem as the center-constrained minimum enclosing ball (CCMEB) problem, I extend the CVM algorithm to the regression and ranking setting. Moreover, the condition on the kernel function is relaxed. Thus, the enhanced CVM algorithm can be used with any linear/nonlinear kernels.; Finally, I introduce the enclosing ball (MEB) problem where the ball's radius is fixed and thus does not have to be minimized. I develop efficient (1 + epsilon)- approximation algorithms that are simple to implement and do not require any sophisticated numerical solver. For the Gaussian kernel in particular, a suitable choice of this (fixed) radius is easy to determine, and the center obtained from the (1+epsilon)-approximation of this EB problem is close to the center of the corresponding MEB. Experimental results show that the proposed algorithm has accuracies comparable to the other large-scale SVM implementations, but can handle very large data sets and is even faster than the CVM in general.

Keywords/Search Tags:

SVM, CVM, Data sets, Vector, MEB, Problem

Related items

1	Study On Imbalanced Data Sets Classi-fication Method And Its Application In Telecommunication
2	Research On Classification Algorithms Of Data Mining Based On Imbalanced Data Sets
3	Combating the class imbalance problem in small sample data sets
4	Support Vector Data Description And Support Vector Machine And Their Applications
5	A Study On Algorithm For Classification Based On Support Vector Data Description
6	The Study Of Several Key Issues On Large Data Sets Classification Techniques In Pattern Recognition
7	Research On Data Stream Classification Based On Granular Computing And F-Rough Sets Extension
8	Pattern Recognition Method And Application Of Research Based On Default Data Sets
9	Research On Fast Training Method Base On Core Vector Machine And Support Vector Machine
10	Research Of Classification On Imbalanced Data Sets And Its Application In Student Loans Credit Risk Management