Font Size: a A A

Quantized Adaptive Subgradient Algorithms And Their Applications

Posted on:2021-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:K XuFull Text:PDF
GTID:2428330611965676Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data explosion and an increase of model size drive the remarkable advances in large-scale machine learning.But even the simple linear model,data with high feature dimension also make model training time-consuming and model storage difficult.At the same time,to adapt to a variety of tasks,many complex deep neural networks are designed.This makes large-scale machine learning have the characteristics of high computational complexity and a large number of model parameters.To solve the problem of time-consuming model training and limited model storage space,there are two main difficulties.On one hand,distributed training is often used to accelerate model training,in which how to reduce communication costs for exchanging information,e.g.,stochastic gradients among different workers,is a key bottleneck for training efficiency.On the other hand,models with a large number of parameters are often difficult to be directly applied to devices with limited memory and computing resources.So,a sparse model is usually needed for easy storage.But how to balance model sparsity,performance and communication cost in the distributed framework is still an open question.To overcome the two difficulties simultaneously,we propose quantized composite mirror descent adaptive subgradient(Quantized cmd adagrad)and quantized regularized dual average adaptive subgradient(Quantized rda adagrad)for distributed training.To be specific,we explore gradient quantization to reduce the communication cost per-iteration in distributed training and construct a quantized gradient based adaptive learning rate matrix to achieve a balance between communication costs,accuracy,and model sparsity.Moreover,we theoretically find that large quantization error brings in extra noise,which influences the convergence and sparsity of the model.Therefore,a threshold quantization strategy with a relatively small error is adopted in Quantized cmd adagrad and Quantized rda adagrad to improve the signal-to-noise ratio and preserve the sparsity of the model.Theoretical analysis shows that the convergence rate of the proposed algorithms is.Both theoretical analyses and empirical results demonstrate the efficacy and efficiency of the proposed algorithms.
Keywords/Search Tags:distributed training, adaptive subgradient, gradient quantization, sparse model
PDF Full Text Request
Related items