Font Size: a A A

Research On Federated Learning Algorithm Under Data Label Distribution Skew

Posted on:2024-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZhangFull Text:PDF
GTID:2568307094484514Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,relevant countries and institutions have paid more and more attention to privacy protection issues,and hope to obtain high-performance machine learning models without disclosing sensitive data to the public.Therefore,federated learning aims at noncentralized training of machine learning models where data is kept locally.In other words,during the training process,the client holds local private datas which are not publicly released,and the client exchanges the machine learning model with the server.However,there are many problems and challenges in model training under the background of privacy protection,data with non-independently and identically distribution is one of the most important reasons affecting the performance of the global model.Therefore,This paper conducts in-depth research on the label distribution skew on various clients under non-independently and identically distribution.The specific work is as follows:(1)Propose a training method and aggregation strategy based on weighted positive and negative gradient samples.From the perspective of model training and back propagation,this paper firstly analyzes the impact of data label distribution skew on the gradient,then introduces the concept of the positive and negative samples and the positive and negative gradients.Meanwhile,the method changes the softmax classification function and the loss function of traditional training,so that the classes which are existing on the client are only affected by the positive and negative gradients of the label samples in the process of model training.For the missing sample classes in the client,the negative gradient effect brought by different labels samples will be completely discarded.In addition,when the model are uploaded to the server,the differences between client models are measured and aggregated by cosine similarity,so as to improve the convergence and the speed of the federated learning algorithm.(2)In the federated learning base on the positive and negative gradients samples weighted,The multi-classification optimization problem concerned by the global model becomes a suboptimization problem that fits the local data distribution on the client side,and the convergence of the global model is promoted by slowing down the local model optimization speed.However,under the global optimization problem,the disintegrated sub-optimization problem loses the supervision information from the negative class samples..In order to improve the generalization performance of the global model,therefore,a regularization strategy is proposed to optimize the angle of each classification vector in the feature space to obtain a better classification boundary.
Keywords/Search Tags:Federated Learning, Non-independently and identically distribution data, Positive and negative gradients, Class embedding vector regularization
PDF Full Text Request
Related items