Research On Label Noise Filtering Algorithm Based On Federated Learning

Posted on:2022-03-08

Degree:Master

Type:Thesis

Country:China

Candidate:G C Gao

Full Text:PDF

GTID:2568306326976949

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Federated learning(FL)is a new computing paradigm adapted to distributed edge computing,which enables the distributed clients to collaboratively learn a model without sharing their raw data.AFL system could achieve the objective of data privacy protection by sharing its encrypted model parameters,gradients,etc.In practical application scenarios of federated learning,label noise is inevitable mainly due to annotators’ errors or client devices’ heterogeneities.Specifically,the inconsistent level of label noise presented by each client could lead to the prominent challenge of label quality disparities,which further reduces the accuracy and robustness of the federated learning model.Existing works to address label quality disparities in FL generally rely on an additional and costly benchmark dataset to inhibit those clients with many noisy labels.However,introducing an external benchmark dataset is also challenging and would inevitably generate potential data biases.For example,there are different degrees of skin color or gender biases in many facial analysis algorithms released by IBM,Microsoft,and other giant companies.Hereafter,introducing the aftermentioned benchmark dataset to clients would bring with them a set of conscious and unconscious data biases.In order to effectively address the above challenges,this thesis proposes a label noise filtering algorithm based on federated learning(FedIMF),which aims to construct an internal measurement dataset to both evaluate and filter significant label noisy clients,to improve the accuracy and robustness of the federated learning model.The research contents are as follows.Firstly,we design a modified method to extract an internal measurement dataset,which extracts a clean and class-balanced internal measurement dataset(IMD)from the client with the most data and the most label categories.Secondly,we filter out the label noise and noisy clients by performing the process of credibility evaluation(CE)on each client based on IMD.We specifically use the JS divergence method to evaluate the similarity of the loss cumulative distribution between the IMD and each client’s local data.Thirdly,we establish the federated learning model based on the credibility value after the label noises are filtered out.Finally,the effectiveness and robustness of the proposed FedIMF are verified on two public datasets(CIFAR-10 and MNIST).The experimental results show that FedIMF is effective and feasible,and the accuracy outperforms three baseline algorithms.The contributions of this thesis are twofold.First,a modified label noise detection algorithm is designed to extract the internal measurement dataset,which would effectively eliminate the potential data biases.Second,a credibility evaluation method is proposed to distinguish the loss cumulative distribution between clean label samples and closed-set label noise samples,thus it could improve the accuracy and robustness of the federated learning model.

Keywords/Search Tags:

Federated learning, Label quality disparities, Internal measurement dataset, Label noise filtering

PDF Full Text Request

Related items

1	Researches On The Estimation And Filtering Methods Of Numerical Label Noise
2	Label Noise Filtering Method Based On Confidence Distribution
3	Research On Label Noise Filtering Learning Algorithm Based On Multi-granularity
4	Research And Implementation Of Federated Learning Optimization Mechanism For Noise Awareness In Heterogeneous Environment
5	Research On Label Noise Based On Ensemble Learning
6	Improved Label Noise Filtering Method Based On Active Learning
7	Noise Tolerated Weekly Supervised Multi-Label Learning With Label Enrichment
8	Fault Diagnosis Algorithms In The Presence Of Label Noise
9	A Research Of Multi-label Learning Focused On Label Correlation And Label Enhancement
10	Research On Multi-label Learning And Algorithms Based On Data And Label Correlations