Font Size: a A A

Distilled Federated Learning Based On Tree Model

Posted on:2022-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ShanFull Text:PDF
GTID:2518306572455174Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology,a lot of data-driven artificial intelligence technologies are playing important roles in all aspects of society.Due to the lack of data in quantity and quality,as well as privacy protection agreements in some enterprises,nowadays,artificial intelligence industry is still faced with two major problems:one is the barrier of data,the other is privacy protection problem.Federated learning appears to be a new way to solve these problems.It enables enterprises participating in federated learning to transmit data across devices and learn collaboratively under privacy protection.Through this special distributed learning,enterprises can transmit model update information(i.e.,gradient information)and learn federated model together,thus achieving resource sharing.However,there are still three key problems in federated learning to solve,namely communication cost,data heterogeneity and privacy protection.A new federated learning framework based on distilled data is proposed in this thesis,which transmits the synthetic distilled data between clients instead of the parameter update information in the traditional federated learning to work in these there problems.While the target of the traditional dataset distillation algorithm is image classification,and for structured data,some tree ensemble models,such as random forest and XGBoost,are widely used in the industry.Thus he thesis improves the algorithm to deal with the structured data specifically.The thesis proposes a new dataset distillation algorithm based on decision tree according to the way of tree model generation.The algorithm aggregates the original dataset and model information into a small dataset.In addition,this paper also constructs a distilled-data-based federated learning framework based on tree model,which can choose different tree models or tree ensemble models,such as random forest,according to different types of data.The algorithm has been verified theoretically and experimentally,that the distilled data are synthetic data and won't reveal the specific data information.The transmission of distilled data enables distilled federated learning to reduce the communication cost and balance the data distribution of each client,as well as make the learning effect close to the upper limit of federated learning theoretically.
Keywords/Search Tags:federated learning, dataset distillation, decision tree, non-independently and identically distributed
PDF Full Text Request
Related items