Federated learning(FL)is a machine learning framework,in which multiple nodes collaborate to train models under the orchestration of a central server,while keeping training data decentralized.Its aim is to address the challenges of imbalanced,non-independent and identically distributed(Non-IID)data through limited bandwidth exchange and unreliable devices.FL embodies the principles of centralized collection and minimal data,which can alleviate many system privacy risks and costs.Due to its data non-movement approach,it has received increasing attention as it can avoid economic,policy,and security risks associated because of data exchange.However,FL currently faces various challenges,mainly the challenge of data heterogeneity.Data heterogeneity typically refers to Non-IID data.On the one hand,data heterogeneity can cause the models of various clients in FL to diverge,affecting the model aggregation of FL and impacting its efficiency(convergence rate)and utility(global model accuracy);On the other hand,the complex data heterogeneity environment creates complex parameter transmission in FL.Although FL itself has privacy protection features,it still needs to bear various risks such as member inference attacks during parameter transmission,requiring the combination of other privacy protection methods to protect the model and original data.Therefore,this thesis proposes two new methods to improve the performance and security of FL in the face of the challenges of performance degradation and security protection in data heterogeneity environments.The main work and innovation points of this thesis are as follows:(1)To address the challenge of performance degradation in data heterogeneity environments,this thesis proposes a federated dynamic weighted(Fed DW)algorithm for data heterogeneity environments.Based on the analysis of data heterogeneity,this algorithm proposes the data heterogeneity factor as a quantitative measure of heterogeneity,and analyzes the model update submitted by each client in each round to propose the model update factor as a qualitative measure of heterogeneity.By integrating these two heterogeneity factors,a dynamic weighting algorithm for data heterogeneity environments is designed.The performance improvement of this algorithm is verified through experiments.(2)To address the security challenges in data heterogeneity environments,this thesis proposes a strict differential privacy federated learning(DPFed)algorithm for data heterogeneity environments.Starting from the complex federated learning process in heterogeneity environments,this algorithm comprehensively explores the relationship between the data skewness parameter(heterogeneity parameter)transmission and the model update transmission that may occur in federated learning.Both are incorporated into the same differential privacy protection framework,using a unified privacy budget to protect all parameter transmissions in federated learning,achieving more stringent differential privacy protection.The security and low impact on training performance of this algorithm are verified.(3)Based on the above two algorithms,a differential privacy federated learning platform for data heterogeneity environments is designed and implemented.This platform integrates the two algorithms proposed in this thesis and the classical federated averaging algorithm.An intuitive visualization interface is designed to facilitate federated learning simulation experiments,providing support for research and learning in federated learning under data heterogeneity. |