Research On Federated Learning Optimization Methods For Data Heterogeneity Problems

Posted on:2024-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Wen

Full Text:PDF

GTID:2568307103974719

Subject:Computer Science and Technology

Abstract/Summary:

Machine learning is now moving from the central cloud to the edge cloud for data collection and model training.Modern mobile and Io T devices,such as smartphones,wearable smart devices and smart appliances,generate huge amounts of data every day,and it is a challenge to extract effective value from this data while ensuring privacy and security.To reduce data security risks and ensure privacy protection of user data,leading companies such as Google and Apple have come up with Federated Learning,which can ensure that users’ privacy is not compromised.The core idea of federation learning is that multiple parties collaborate to train a global model without the user device data going out of the local area.The user device receives the initial model broadcasted by the server and uses the local data to train the SGD and transmit it to the central server,where the local model is aggregated into a global model through certain aggregation methods.In a practical application scenario,the sample size and data distribution of different client devices vary greatly,indicating the existence of Non-IID(Non-Independent and Identically Distribution,Non-IID)features in the local data of the client in the federation learning system.From the working principle of federation learning,it can be seen that the client selection stage selects training samples,while the quality of training samples directly affects the model accuracy and convergence speed.The way of aggregating local models into global models will directly affect the degree of playing the importance of local models,thus affecting the overall training accuracy.In the client selection phase,the randomly selected clients often fail to better reflect the distribution characteristics of the global data,resulting in inefficient training of the global model and low model accuracy.In the model aggregation stage,the existing algorithms do not take into account the local model differences caused by data heterogeneity,resulting in the generation of poor quality models with excessive aggregation weights,which affects the accuracy of the global model.In this paper,the focus is on the above issues,and the main research in this paper is as follows:(1)In order to solve the problem of not reflecting the global data distribution characteristics due to the random selection of clients,this paper proposes a client selection method Ch FL based on the quality of the local model.this method uses the client loss value and training time as indicators for selecting clients,and gives higher selection probability to clients with high loss value and faster training speed to improve their chances of participating in training,thus improving the The convergence efficiency of the model is improved.In the specific experimental process,the proposed method in this paper was compared with three baseline algorithms,Fed Avg,Fed Prox and Fed Nova,on five datasets including FEMNIST,EMNIST and MNIST.The experimental results show that the convergence performance of the Ch FL client selection strategy improves by about 15% over the Fed Avg algorithm and by Fed Prox algorithm by about 13% and Fed Nova algorithm by about 12%.(2)In order to solve the problem that the model accuracy is affected by data heterogeneity,this paper proposes a weighted aggregation method based on data importance,Fed AG,which takes into account the skewness of client data distribution,data volume and data quality as indicators for weighted model aggregation,constructs a data importance evaluation model based on these three indicators,and then weights the local model into a global model based on the data importance evaluation model.The local model is then aggregated into a global model based on the data importance assessment model.The algorithm gives a larger aggregation weight to the local model with high quality data to improve the accuracy of the global model.In specific experiments,the proposed method is compared with the Fed Avg,Fed Prox and Scaffold baseline algorithms on five datasets including EMNIST,MNIST and CIFAR-10,and the experimental results show that the Fed AG algorithm has an accuracy improvement of up to 6.5% over the Fed Avg algorithm and up to 1.5% over the The results show that the Fed AG algorithm has an accuracy improvement of up to 6.5% over Fed Avg,1.8%over Fed Prox,and 2.1% over Scaffold.

Keywords/Search Tags:

Federated Learning, Data Heterogeneity, Client Selection, Model Aggregation, Model Evaluation

Related items

1	Federated Learning Optimization Algorithms For Data Heterogeneity Scenarios
2	Research On Federated Learning Optimization Algorithm For Non-IID Data
3	Robust And Communication Efficient Multi-Model Solution For Cross-Silo Federated Learning
4	The Key Technologies Of Heterogeneous Federated Learning
5	Research On Model Aggregation Strategies Of Federated Learning In Non-I.I.D. Scenarios
6	Research On Efficient Federated Learning Algorithms For Data And System Heterogeneity
7	Federated Learning Framework Of Non-IID Data Design Via Generated Models
8	A Research On Efficient Federated Learning Algorithms Based On Gradient Temporal Correlation
9	Research And Application Of CTR Prediction Algorithm Based On Federated Learning
10	Wireless Resource Management And Model Aggregation Method For Multi-task Federated Learning