Font Size: a A A

Optimization Method For Federated Learning Model With Unbalanced Dataset

Posted on:2022-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:S HaoFull Text:PDF
GTID:2518306752454154Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence and the popularity of mobile de-vices,various deep learning applications affect every aspect of our life.The success of deep learning is inseparable from the high representation ability of neural network to data and the large and rich datasets.The role of distributed data processing and distributed ma-chine learning is becoming more and more prominent,and the need for multi participant cooperation is emerging.However,in the actual scenario,many data owners are unwilling or not allowed to share their local data due to privacy security and confidentiality policies.Moreover,aggregating the data to data center for training is Unbearable with expensive communication and storage costs.Therefore,federated learning came into being.Fed-erated learning is a distributed machine learning technology,which can enable multiple devices or institutions to cooperate to build a model using local datasets.In the meanwhile,federated training also ensures that the data is kept locally.Especially,federated learning is promising in mobile devices system.People generate a large amount of data every day and store it in private devices.These real and huge data can help federated learning build a more complex and accurate model and provide personalized user experience.However,different from standard datasets,the distribution of these datasets scattered on mobile devices is unbalanced,which includes global imbalance(class imbalance)and local imbalance(non independent and identically distributed,Non-I.I.D.).This character-istic leads to the bias of the model,the accuracy of the model will be greatly reduced.In addition,federated learning usually uses a predefined neural network architecture as the initial model.However,this predefined model structure is not necessarily the optimal for federated training in the scenario of unbalanced data distribution.And the training cycle of Federated learning is too long,so it is undoubtedly very inefficient for researchers to manually adjust the network structure to the optimal.To solve these problems,this paper optimizes the model under unbalanced data sets in federated learning.It mainly includes the following contributions:· Firstly,this paper explores the impact and causes of unbalanced datasets on neural network model through experiments and mathematical derivation.The conclusion is that unbalanced datasets lead to a significant decline in the accuracy of the model;The heterogeneity of data distribution between training set and verification set is the direct reason for the decline of model accuracy;When the neural network structure is determined,the optimal weight under the balanced dataset can not be achieved only through model training;The accuracy of the model can be restored when the training set and the verification set are in identical distribution.· To solve the problem in local unbalanced datasets,a model training scheduling op-timization method based on KL divergence is proposed.According to the balance degree of client dataset,a grouping method based on greedy strategy is designed to form a locally balanced dataset,and an intermediate server is assigned to manage the training in the group.this paper also explored the influence of data distribution on the convergence efficiency.It is concluded that the convergence efficiency is the highest when the model is trained in the order of balance from high to low.There-fore,a model training scheduling optimization method based on KL divergence is designed.· To solve the problem in global unbalanced datasets,a federated learning architec-ture based on neural network structure search is designed in this paper.In view of the scenario that resource constrained mobile devices are used as federated learn-ing and training entities,this paper designs a search strategy based on group.The intermediate server with strong computing power and sufficient storage resources cooperates with the client to search the neural network structure.Finally,experi-ments show that the accuracy of Dense Net on unbalanced cifar-10 is improved by an average of 4.39 %.
Keywords/Search Tags:distributed machine learning, federated learning, unbalanced dataset, neural architecture search, Non-iid
PDF Full Text Request
Related items