Font Size: a A A

Research On Multi-modal Federated Learning Technology For Heterogeneous Modalities

Posted on:2024-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:T L WangFull Text:PDF
GTID:2568306923952449Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the widespread application of data acquisition devices such as mobile devices,intelligent terminals,and sensors,the amount of data generated is increasing explosively.With the support of massive data,machine learning,especially deep neural networks,has demonstrated its powerful advantages in processing complex models.Traditional machine learning involves storing all raw data centrally on a single server for centralized training,which can lead to privacy issues.To address the debate about privacy,researchers have proposed federated learning,which allows clients to train machine learning models locally without sharing private data,and achieve the desired effect of joint training as well.At the same time,the increasing variety of data acquisition devices provides multi-angle and multi-modal data for machine learning,which has led to more and more research on multi-modal learning,making it widely popular in various fields such as audio-visual speech recognition,image and text retrieval,and semantic analysis.The distributed training method of federated learning coincides with the independent and decentralized characteristics of multi-modal data.Multi-modal federated learning technology has unique advantages for machine learning tasks with heterogeneous modalities.However,there are still several problems when applied to real scenes.First of all,the data collected by data acquisition devices in various modalities are not always valid.Due to factors such as power supply,noise,light,magnetic field,and so on,multi-modal data in real scenarios are likely to be damaged or partially invalid;Secondly,there is both redundancy and complementarity between the information provided by various modalities.The redundant parts can jointly enhance the expression of information,and the complementary parts can provide auxiliary information at the same time.Therefore,how to fully extract modality-invariant information and modality-specific information from multi-modal data is a very worthwhile issue to consider;Finally,there are statistical heterogeneity among modalities,that is,data in different modalities usually have different structures.In the same time,there is systemic heterogeneity between local clients,which means different local clients have different characteristics and behavior patterns.If the training of heterogeneous modalities and heterogeneous clients is not differentiated,using traditional methods to uniformly train a model with minimal global loss is likely to be unsatisfactory for any client and modality.To address the three issues mentioned above,this paper proposes a multi-modal federated learning framework for heterogeneous modalities(MM-CFL).Firstly,in response to the problem of invalid modality,this thesis designed an invalid modal detection module to check whether the data of each modality is valid.For invalid modalities,this thesis designed an invalid modal rebuild module,which uses newly generated features to replace the original invalid features to solve the problem of modal invalidity;Secondly,in order to make the multi-modality features extracted from the machine learning network contain both modality-invariant information and modality-specific information,this paper adds the Soft-HGR correlation metric to the optimization objective of the model to more comprehensively extract the features;Thirdly,this thesis introduces federated learning,using model parameters instead of uploading original data,to solve the privacy protection problem;Finally,this paper uses the operation that do clustering only among the same modality,which means divide local clients into different sets within the same modality,then train the same deep learning network only for clients with similar behavior and modalities with similar performance,and solve the problem of client heterogeneity and modality heterogeneity.This thesis uses a real dataset,IEMOCAP,and conducts extensive experiments in emotion recognition scenario.The experimental results show that the framework proposed in this thesis achieves better performance in various indicators compared to existing advanced frameworks in the absence of invalid modalities;In scenarios where there are invalid modalities,the invalid modality detection module and the invalid modality rebuild module designed in this thesis have both played an ideal role,showing a 1.6%and 3.3%improvement in prediction accuracy in ablation experiments.
Keywords/Search Tags:Machine Learning, Multi-modal Learning, Federated Learning, Heterogeneous Modality
PDF Full Text Request
Related items