Data Segmentation For Large-scale Distributed Approximated SVM

Posted on:2019-08-26

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhang

Full Text:PDF

GTID:2428330593451067

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Distributed computing is an important way of promoting large-scale machine learning.Data segmentation is one of the critical issues of distributed machine learning research,which has impacts on generalization performance and parallel efficiency of distributed algorithms.And how to choose the suitable block number has become an vital research topic for distributed machine learning model selection.Existing approaches to data segmentation of distributed machine learning are dependent on empirical evidences or on the number of the processors without explicit criterion.These methods lack of reasonableness and interpretation.To address this issue,we propose a parallel efficiency sensitive criterion of data segmentation with generalization theory guarantee,which improves the computational efficiency of distributed machine learning while retaining test accuracy.We first derive a generalization error upper bound with respect to the block number of the data segmentation.According to the empirical risk minimization theory,the blocked empirical risk minimization is defined.Then we present a data segmentation criterion that is a trade-off between the generalization error and the parallel efficiency.Finally,we implement large-scale Gaussian kernel support vector machines(SVMs)in the random Fourier feature space with the alternating direction method of multipliers(ADMM)framework on high-performance computing clusters(HPCC),which adopt the proposed data segmentation criterion.Experimental results on several largescale benchmark datasets show that the proposed data segmentation criterion is reliable and effective for the large-scale SVMs.The proposed criterion is of common use,which is not restricted to distributed ADMM framework and SVMs.

Keywords/Search Tags:

Data segmentation, Large-scale SVMs, Model selection, ADMM, Random Fourier features

PDF Full Text Request

Related items

1	Random Mapping Approach To Model Selection Of Large-Scale Kernel Methods
2	Sublinear Algorithms For Large-scale Kernel Learning
3	Some Studies On Subsampling And Variable Selection In Large-scale Data
4	Research On The ADMM-Based Algorithm For Large Scale Array Pattern Synthesis
5	Research On Key Problems About Large-Scale Text Clustering
6	Research On Large Scale Data Clustering Analysis Methods
7	Research On Large-scale Regularized Machine Learning Algorithms
8	Research And Application Of Clustering Algorithms For Large Scale Data
9	Research On Key Technologies Of Large-scale Graph Processing
10	Segmentation Model Of Illusion Contour Image And Its Fast Algorithm