| Federated learning(FL)is a new computing paradigm adapted to distributed edge computing,which enables the distributed clients to collaboratively learn a model without sharing their raw data.AFL system could achieve the objective of data privacy protection by sharing its encrypted model parameters,gradients,etc.In practical application scenarios of federated learning,label noise is inevitable mainly due to annotators’ errors or client devices’ heterogeneities.Specifically,the inconsistent level of label noise presented by each client could lead to the prominent challenge of label quality disparities,which further reduces the accuracy and robustness of the federated learning model.Existing works to address label quality disparities in FL generally rely on an additional and costly benchmark dataset to inhibit those clients with many noisy labels.However,introducing an external benchmark dataset is also challenging and would inevitably generate potential data biases.For example,there are different degrees of skin color or gender biases in many facial analysis algorithms released by IBM,Microsoft,and other giant companies.Hereafter,introducing the aftermentioned benchmark dataset to clients would bring with them a set of conscious and unconscious data biases.In order to effectively address the above challenges,this thesis proposes a label noise filtering algorithm based on federated learning(FedIMF),which aims to construct an internal measurement dataset to both evaluate and filter significant label noisy clients,to improve the accuracy and robustness of the federated learning model.The research contents are as follows.Firstly,we design a modified method to extract an internal measurement dataset,which extracts a clean and class-balanced internal measurement dataset(IMD)from the client with the most data and the most label categories.Secondly,we filter out the label noise and noisy clients by performing the process of credibility evaluation(CE)on each client based on IMD.We specifically use the JS divergence method to evaluate the similarity of the loss cumulative distribution between the IMD and each client’s local data.Thirdly,we establish the federated learning model based on the credibility value after the label noises are filtered out.Finally,the effectiveness and robustness of the proposed FedIMF are verified on two public datasets(CIFAR-10 and MNIST).The experimental results show that FedIMF is effective and feasible,and the accuracy outperforms three baseline algorithms.The contributions of this thesis are twofold.First,a modified label noise detection algorithm is designed to extract the internal measurement dataset,which would effectively eliminate the potential data biases.Second,a credibility evaluation method is proposed to distinguish the loss cumulative distribution between clean label samples and closed-set label noise samples,thus it could improve the accuracy and robustness of the federated learning model. |