Font Size: a A A

Distributed Machine Learning With Adaptive Sample Selection

Posted on:2021-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:H GaoFull Text:PDF
GTID:2428330647451042Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence technologies have been successfully applied to many fields such as computer vision,speech processing,and natural language processing.At the same time,with the increasing complexity of application scenarios,we often need to use massive training data and large-scale machine learning models to achieve goals.In large-scale machine learning tasks,it is difficult to train models with a single machine.Distributed machine learning technology using multiple machines to work together has become the mainstream solution.In the training process of most machine learning tasks,each epoch requires all training samples to participate in training.For tasks with a large number of training samples,one epoch requires a lot of time.In addition,during the distributed training process,multiple machines inevitably need to communicate to exchange information,which will bring additional communication overhead.Therefore,both computing overhead and communication overhead will affect the training speed in distributed training.In the previous research,we proposed Adaptive Sample Selection algorithm(ADASS),which can reduce the computing cost in training process.This paper introduces the ADASS algorithm into distributed machine learning,and designs a distributed machine learning algorithm combined with adaptive sampling(ADASS-DML).At the same time,this paper also designs algorithm for reducing the communication overhead of ADASS-DML,which provides a solution for distributed machine learning with both efficient computation and efficient communication.Specifically,this article includes the following contributions:1.In this paper,we apply adaptive sample selection to distributed machine learning,and ADASS-DML algorithm is designed and implemented under the currentlycommonly used communication frameworks(including Parameter Server framework and Ring All Reduce framework).This algorithm can adaptively select some important samples to participate in the next epoch training according to the real-time situation,thereby speeding up the training process without sacrificing model accuracy.This paper verifies the effectiveness of the ADASS-DML algorithm through experiments on real data sets,and compares the training speed of the algorithm under two communication frameworks.Then we find that even in the Ring All Reduce framework with higher training efficiency,the communication overhead is still a bottleneck that affects the efficiency of distributed training.The following two communication compression algorithms are designed to solve this problem.2.In the Ring All Reduce framework,the gradient of communication between working nodes is usually 32-bit floating point numbers.Currently,there are some quantization algorithms on other communication frameworks that can represent the communication tensor with lower bits to reduce communication overhead,but these algorithms cannot be directly applied to the Ring All Reduce framework.Therefore,this paper designs and implements a quantization algorithm Q-ADASS combined with adaptive sample selection under the Ring All Reduce framework.Experiments on real data sets show that the algorithm can use low bits to represent the tensor of communication while performing adaptive sampling,and does not affect the accuracy of the final model.3.The communication compression brought by the quantization algorithm can not completely solve the communication overhead problem in distributed ADASS algorithm.Based on the characteristics of the Ring All Reduce framework,this paper designs and implements a random sparse algorithm RS-ADASS combined with adaptive sample selection.The algorithm does not need to communicate the complete gradient when synchronizing model.It only needs a small part of the dimensions of the communication gradient.Experiments on real data sets show that the algorithm can further reduce the training communication overhead without sacrificing the accuracy of model.
Keywords/Search Tags:Machine Learning, Distributed Computing, Sample Selection, Communication Compression
PDF Full Text Request
Related items