Quantized Adaptive Subgradient Algorithms And Their Applications

Posted on:2021-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:K Xu

Full Text:PDF

GTID:2428330611965676

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Data explosion and an increase of model size drive the remarkable advances in large-scale machine learning.But even the simple linear model,data with high feature dimension also make model training time-consuming and model storage difficult.At the same time,to adapt to a variety of tasks,many complex deep neural networks are designed.This makes large-scale machine learning have the characteristics of high computational complexity and a large number of model parameters.To solve the problem of time-consuming model training and limited model storage space,there are two main difficulties.On one hand,distributed training is often used to accelerate model training,in which how to reduce communication costs for exchanging information,e.g.,stochastic gradients among different workers,is a key bottleneck for training efficiency.On the other hand,models with a large number of parameters are often difficult to be directly applied to devices with limited memory and computing resources.So,a sparse model is usually needed for easy storage.But how to balance model sparsity,performance and communication cost in the distributed framework is still an open question.To overcome the two difficulties simultaneously,we propose quantized composite mirror descent adaptive subgradient(Quantized cmd adagrad)and quantized regularized dual average adaptive subgradient(Quantized rda adagrad)for distributed training.To be specific,we explore gradient quantization to reduce the communication cost per-iteration in distributed training and construct a quantized gradient based adaptive learning rate matrix to achieve a balance between communication costs,accuracy,and model sparsity.Moreover,we theoretically find that large quantization error brings in extra noise,which influences the convergence and sparsity of the model.Therefore,a threshold quantization strategy with a relatively small error is adopted in Quantized cmd adagrad and Quantized rda adagrad to improve the signal-to-noise ratio and preserve the sparsity of the model.Theoretical analysis shows that the convergence rate of the proposed algorithms is.Both theoretical analyses and empirical results demonstrate the efficacy and efficiency of the proposed algorithms.

Keywords/Search Tags:

distributed training, adaptive subgradient, gradient quantization, sparse model

PDF Full Text Request

Related items

1	Efficient Adaptive Subgradient Methods Based On Random Projection
2	Distributed Optimization Method And Convergence Analysis For Saddle-point Problem Of Multi-agent System Based On Quantization Information
3	Image De-blurring Based On Adaptive Sparse Model
4	Research On Sparse Adaptive Filtering Algorithms
5	Distributed Subgradient Random Projection Algorithm Over Switching Topology With Communication Delays
6	Design And Implement Of Adaptive Training System
7	Distributed Subgradient Optimization Algorithms For Multi-agent Switched Networks
8	Research On Distributed Stochastic Gradient Descent Algorithm
9	Enhanced Quantization Based Bag-of-Features Model And Its Applications
10	Optimal Design And Implementation Of Distributed Deep Learning Training