Font Size: a A A

Multi-model Federated Learning Based On Distilled Data

Posted on:2022-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:M SunFull Text:PDF
GTID:2518306572955169Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In big data era today,the effective integration of multi-party data resources can promote the development of all walks of life.However,data sharing is often limited by privacy treaties.Federated learning allows multi participants to train a model collaboratively without exposing their local data.While in practical application,the problem of Non-IID among participants is often accompanied,which restricts the further development of federated learning.Therefore,it is very meaningful to design an effective framework for the Non-IID problem in federated learning.Clustered federated learning algorithm(C-FL)provides an idea to solve the Non-IID problem by training multiple models,which is currently applied to the general framework of clustered federated learning(CFL).However,when there are many different data distributions between clients,the entire federated training process needs multiple cloudedge information transmission,especially when the federated model structure is complex,it will produce a lot of communication costs.At the same time,there is a risk of disclosing the privacy of user data when transporting client local model updates.This paper mainly studies the training of horizontal federated learning model in the Non-IID situation of client data,and overcomes the deficiencies in the general CFL framework.Firstly,this paper presents a distilled clustered federated learning algorithm(DC-FL)with privacy protection capabilities,optimizes the C-FL algorithm on the cloud-edge communication rounds by using local distilled data from each client to guide the server to group clients,which ultimately helped the client to train their unique personalized model.Secondly,based on the DC-FL algorithm,this paper designs a new framework for clustered federated learning based on distilled data(CFL-D),which overcomes the limitation of communication cost of the general CFL framework in the case of large client data distribution while guaranteeing user data privacy,realizes multi-model federated learning training.Finally,the validity of the general CFL framework to solve the NonIID problem of client data is verified experimentally,and the CFL-D framework is implemented.At the same time,this paper compares the CFL-D framework with the CFL framework in terms of communication rounds and traffic.Experiments show that the CFL-D framework can meet the requirements of user privacy protection.On EMNIST dataset,the number of cloud-side communications and traffic in total are relatively reduced by 24.62% and 21.29%,respectively,compared with the CFL framework,and the reduction is more pronounced when there are more types of client data labels.
Keywords/Search Tags:federated learning, user clustering, dataset distillation, individualized model, non-independent identically distributed
PDF Full Text Request
Related items