Font Size: a A A

Research On Distributed Deep Learning Technology Based On Model Averaging

Posted on:2020-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y W FuFull Text:PDF
GTID:2518306548493964Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,deep learning has achieved satisfactory performance in various tasks such as image classification,speech recognition,and machine translation.This is closely related to the scale and quality of data.More high-quality data can make deep neural network models improve greatly.However,under realistic conditions,only a few institutions have the ability to collect large-scale data,and most institutions can only collect a small amount of data.In the face of professional and high privacy data,the institutions are often reluctant to share data,and because of the laws and regulations related to data privacy protection,direct transmission of raw data is difficult to implement.In order to break the data barrier between institutions,the distributed deep learning algorithm — model averaging implements a simple cross-institutional joint learning.But the naive model averaging algorithm has problems such as high communication overhead,poor model performance,and indirect leakage of data privacy.This paper conducts research on the above issues,and the main contributions are as follows:1.In order to reduce communication overhead,this paper designs and implements a multi-model compression algorithm based on knowledge distillation.Since the model averaging algorithm is based on transmitting model parameters,reducing the size of the model can directly improve the communication efficiency of the algorithm.The algorithm takes sound event detection as the target task.Firstly,the architecture of each teacher model(complex pre-training model with better performance)and student model(compact model with poor performance)is determined.Then,according to the characteristics of this task,a frame-wise distillation method that can transmit temporal knowledge is proposed.Finally,multi-model based distillation is proposed to make use of multi-party experience knowledge.The experimental results show that the multi-model based distillation algorithm can provide a model compression ratio of 39-53 times that of the teacher model,and at the same time improve the model performance of the student model.2.In order to improve the data privacy protection ability,this paper designs and implements a collaboration learning algorithm based on data augmentation.Because the model averaging algorithm directly transmits the model parameters during the training process,it poses hidden dangers of indirect leakage of data privacy.The algorithm improves the training process of the naive model averaging algorithm,performing Mixup data augmentation on the training data before the local training process.This changes the principle of empirical risk minimization to vicinal risk minimization,and provides data privacy protection by changing the original data distribution.The experimental results show that the collaboration learning algorithm can improve the models performance in image classification and text classification tasks.And through the comparison of model inversion attack results on face recognition tasks,it is proved that the algorithm can enhance the protection ability against the indirect data privacy attack.Based on the model averaging algorithm,this paper studies model compression technology that can reduce communication overhead and joint learning algorithm that can improve data privacy protection capabilities,and both methods can improve model performance.This paper provides a feasible solution for distributed learning in cross-institutional WAN environments.
Keywords/Search Tags:Distributed deep learning, model averaging, model compression, data privacy protection, knowledge distillation, data augmentation
PDF Full Text Request
Related items