Font Size: a A A

Data Segmentation For Large-scale Distributed Approximated SVM

Posted on:2019-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2428330593451067Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Distributed computing is an important way of promoting large-scale machine learning.Data segmentation is one of the critical issues of distributed machine learning research,which has impacts on generalization performance and parallel efficiency of distributed algorithms.And how to choose the suitable block number has become an vital research topic for distributed machine learning model selection.Existing approaches to data segmentation of distributed machine learning are dependent on empirical evidences or on the number of the processors without explicit criterion.These methods lack of reasonableness and interpretation.To address this issue,we propose a parallel efficiency sensitive criterion of data segmentation with generalization theory guarantee,which improves the computational efficiency of distributed machine learning while retaining test accuracy.We first derive a generalization error upper bound with respect to the block number of the data segmentation.According to the empirical risk minimization theory,the blocked empirical risk minimization is defined.Then we present a data segmentation criterion that is a trade-off between the generalization error and the parallel efficiency.Finally,we implement large-scale Gaussian kernel support vector machines(SVMs)in the random Fourier feature space with the alternating direction method of multipliers(ADMM)framework on high-performance computing clusters(HPCC),which adopt the proposed data segmentation criterion.Experimental results on several largescale benchmark datasets show that the proposed data segmentation criterion is reliable and effective for the large-scale SVMs.The proposed criterion is of common use,which is not restricted to distributed ADMM framework and SVMs.
Keywords/Search Tags:Data segmentation, Large-scale SVMs, Model selection, ADMM, Random Fourier features
PDF Full Text Request
Related items