Support vector machines(SVMs)are particular linear classifiers which are based on the margin maximization principle.They perform structural risk minimization,which improves the complexity of the classifier with the aim of achieving excellent generalization performance.The SVM accomplishes the classification task by constructing,in a higher dimensional space,the hyperplane that optimally separates the data into two categories.Stochastic Gradient Descent algorithm(SGD)is a simple and effective algorithm for SVM.It is particularly fast for linear classification and it is also adapted to the non-linear classification with Mercer kernel.The running time scales linearly with the number of iterations and does not depend on the number of the training size.In this paper,we look at different variants of gradient descent,and using them to optimize the linear SVM to figure out whether these algorithms will improve linear SVMs or not.In order to improve the convergence rate and classification accuracy with large data sets.This paper also proposes a MapReduce-based SVM ensemble algorithm with SGD.We utilize Hadoop Distributed File System to store big training set and MapReduce parallel computing model to training several SVMs as SVM ensemble.The results show that our methods achieve a faster convergence rate than Pegasos that is a traditional SGD algorithm. |