Font Size: a A A

Research On Federated Learning Methods And Applications For Heterogeneous Data Sources

Posted on:2022-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:S XiFull Text:PDF
GTID:2518306614460714Subject:Investment
Abstract/Summary:PDF Full Text Request
Federated Learning(FL)is a distributed machine learning method that enables multi-party data to collaboratively learn a common model without the need to collect data locally,centrally transfer and store it in a server.As a result,the resource consumption on the cloud is reduced and the privacy of the client is enhanced.However,different from traditional artificial intelligence techniques,there are great challenges in this distributed learning method and its application.First,the clients are independent from each other and do not share data.Data collection methods and data sources lead to different distribution among data sets,which seriously affects the performance of the training model.Second,when federated learning is applied in the field of medical diagnosis,the scale of the data set is relatively small compared with other common public data sets,which may make the performance of the training model not trusted by hospitals and so on.Thirdly,medical data involves the privacy data of patients in many aspects.While addressing the heterogeneity and availability of data,privacy protection of data sets may be neglected,resulting in the risk of disclosure of real data.Therefore,in the framework of federal learning,privacy security is always a great challenge.In order to solve the above problems,FedSim model is proposed based on the federated learning algorithm of cosine similarity.For the heterogeneity of data,the non-independent and identical distribution(Non-I.I.D)problem of the data set is mainly considered.In the aspect of local imbalance,the calculation of loss function is improved to force the local model to accelerate convergence and improve the performance of the model.In the aspect of global imbalance,the cosine similarity between the global distribution and the local distribution is taken as a new weight for server aggregation to alleviate the model performance degradation caused by the Non-I.I.D problem.The residual neural network model was selected for the model training on the local client,and the experimental results showed that the model was superior to the baseline model under different Non-I.I.D settings.In the applied research of federated learning,this paper chooses to perform diagnostic classification on chest X-ray images of patients with novel coronavirus(COVID-19)pneumonia.This paper proposes a Federated Differential Privacy Generative Adversarial Network(FedDPGAN)model.Specifically,this paper uses distributed DPGAN to generate different patient data and increase training samples.In the training process of the GAN model,the discriminator needs to distinguish the generated data from the real data.In order to protect data privacy in this process,differential privacy technology is introduced to ensure the privacy and security of the real training data.More importantly,this model can alleviate the influence of non-I.I.D problem of training data on model performance.In the aspect of experiments,this paper selects multiple baseline models to compare with this model,and tests the diagnostic accuracy of the model under the I.I.D and Non-I.I.D settings of the dataset,while considering the accuracy of model under different degrees of privacy protection.In the above experiments,the performance of this model is better than that of the baseline model.To sum up,this paper proposes the FedSim model for the federated learning algorithm and the FedDPGAN model for the medical diagnosis of federated learning application for the Non-I.I.D problem of the dataset,and evaluates the model performance based on the FEMNIST dataset and the COVID-19 dataset.
Keywords/Search Tags:Federated Learning, Non-independent And Identical Distribution, Cosine Similarity, Differentially Private, Generative Adversarial Networks
PDF Full Text Request
Related items