As artificial intelligence technology rapidly develops and widely applies,people pay more attention to the generalization ability of models in different scenarios.Multisource domain adaptation,as a method to solve the problem of cross-domain generalization,has attracted increasing attention in both academia and industry.Currently,multi-source domain adaptation has solved the adaptation problem from multiple source domains to a target domain.Although the source domain is related to the target domain,the difference in data distribution between the source and target domains may lead to negative transfer,and there are also certain differences among multiple source domains.Choosing high-quality source data is advantageous for reducing the problem.However,existing methods have selected data with a unified standard,ignoring the diversity of multiple source domains.In some scenarios,the irrelevant features between the source and target domains may cause significant negative transfer.Based on the above problems,this thesis conducts the following research:(1)To solve the problem of domain adaptation bias caused by the differences between the source and target domains in multi-source domain adaptation,this thesis propose a data selector based on the Soft Actor-Critic(SAC)algorithm of reinforcement learning,which is used for multi-source domain adaptation.It combines the Soft ActorCritic algorithm of reinforcement learning with the concept of entropy to encourage exploration.It unifies the adaptation problem of single-source and multi-source domains,evaluates the quality of domain data through reinforcement learning rewards,and adjusts the model through rewards.Therefore,the alignment between the source and target domains is greatly simplified because it no longer needs to align all source domains with the target domain.In this case,this thesis adapt our model by using the reinforcement learning method to select better samples for learning.(2)To solve the problem of differences between multiple source domains in multisource domain adaptation,this thesis propose to use the Deep Deterministic Policy Gradient(DDPG)algorithm based on reinforcement learning to select our network,which includes a parameter predictor that changes the model parameters on each sample basis,that is,to achieve mapping.Since each domain is considered as the distribution of image samples,the domain adaptation model is statistically implemented through the adaptation of the model on each sample.The Dynamic Domain Model Selection(DDMC)learns how to adapt the model’s parameters and adapt the joint of the source domain.Therefore,the alignment between the source and target domains is greatly simplified.In this case,the model can easily adapt to target samples in any aligned part of the target domain and the source domain.(3)To solve the problem that the irrelevant features between the source and target domains may cause significant negative transfer,this thesis attempt to explore the research of adversarial reinforcement learning in multi-source domain adaptation.Inspired by the reinforcement learning selector and dynamic model selection,this thesis use the data selected by the selector for transfer learning training,and use the model loss and the accuracy of the transfer as rewards,and feed back to our selector and model,so that the selector can judge the quality of data based on rewards,and the model can dynamically adjust the parameters based on rewards,trying to achieve co-learning effect.Based on reinforcement learning and guided by multi-source domain adaptation tasks,we conducted an analysis of reinforcement learning-based multi-source domain adaptation.These research achievements are not only an effective supplement to the current multi-source domain adaptation technology but also provide more ideas and solutions for it. |