Font Size: a A A

Research On Domain Adaptation Based Cross-Domain Classification Methods

Posted on:2024-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2568307106499464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of a new generation of information revolution and the rapid development of Artificial Intelligence(AI),the application of AI technology to extract effective information from massive data is of great significance in data mining,data prediction and other tasks.As the research and application of AI continue to deepen,the related machine learning models have become more complex,and the scale of data samples that need to be labeled in different fields has increased rapidly.However,labeling a large number of data samples is time-consuming and laborious.In order to solve this problem,the domain adaptation methods in transfer learning came into being,which aim to reduce the differences between domains and achieve the purpose of completing machine learning tasks across domains.It not only requires the model to maintain accuracy on the specific target task(such as data classification)to be performed,but also to minimize the difference in data distribution between the source domain and the target domain.In recent years,domain adaptation methods have developed rapidly and achieved high performance on some cross-domain tasks.However,in the actual applications,domain adaptation is limited by many factors,such as data privacy and performance bottlenecks,which leads to the following scenarios:(1)source-free domain adaptation problem that the source domain data is inaccessible due to data privacy;(2)active sourcefree domain adaptation,which can greatly improve the limited adaptation performance by labeling a very small number of target domain samples.The domain adaptation problems in these new scenarios often make the current domain adaptation algorithms with excellent performance lose effect,and the robustness greatly reduced.To solve the intractable problem that current domain adaptation algorithms are difficult to implement in new domain adaptation scenarios,this thesis proposes new solutions and corresponding domain adaptation algorithms for each new scenario.Based on the most current cross-domain classification tasks(such as cross-domain image classification and cross-domain text classification)in domain adaptation,this thesis designs training schemes,proposes corresponding algorithms and conducts experimental verifications.The main contributions of this thesis are as follows:This thesis explains the basic concept of domain adaptation in detail,reviews and summarizes the cross-domain algorithms based on domain adaptation in recent years,and especially illustrates the preconditions of two relatively novel domain adaptation problems(source-free domain adaptation and active source-free domain adaptation).This thesis proposes a Source-free Implicit Semantic Augmentation(SFISA)algorithm for cross-domain classification based on Source-free domain adaptation.Aiming at the situation that the source domain data cannot be accessed,this thesis designs and trains the source domain class prototype generator to generate source domain class prototypes to simulate the data distribution of the source domain.In order to achieve adaptive classification effect,the algorithm implicitly implements semantic enhancement in the direction of the target domain to enhance the adaptability of the classifier to the target domain data.In the process of implementing the algorithm,maintain the diversity and discrimination in the classification data space to enhance the robustness of the overall algorithm.SFISA makes the source prototype correctly capture the semantic transformation between domains,and combine various internal covariances of the target domain,so that the augmented source domain class prototype can have the target semantics.This thesis proposes a weighted clustering algorithm based on uncertainty and diversity(Weighted Clustering Based on Prototype Distance and Top-T Entropy,PDEWC)for cross-domain classification problems based on active Source-free domain adaptation.The algorithm measures the uncertainty of each target domain sample,and measures the sample diversity according to the feature space coverage,and selects samples with high label uncertainty and diversity to implement self-training of the model.The core of the PDEWC algorithm is to transform the uncertainty measurement and diversity measurement into weighted clustering problems in the passive domain adaptive scenario.Aiming at the problem that it is difficult to balance uncertainty and diversity at the same time,PDEWC weighs the importance relationship between uncertainty and diversity in order to find samples that take into account both uncertainty and diversity for labeling self-training.In order to prove the effectiveness of the proposed method SFISA and PDEWC,this thesis conducts theoretical analysis and experimental verifications(comparative experiments,ablation analysis,robustness analysis,and parameter setting analysis)of these two methods.Theoretical analysis and a large number of experimental verifications show that: SFISA can effectively construct the semantic differences between domains,and implement cross-domain implicit augmentation based on semantic differences to enhance the adaptation ability of classifiers.SFISA achieves performance close to or even better than existing domain adaptation methods under the condition of ensuring the privacy of source domain data.Based on the idea of the self-adaptation core,PDEWC can greatly improve the cross-domain performance under source-free domain adaptation by marking a small number of uncertain and representative samples.
Keywords/Search Tags:Transfer Learning, Domain Adaptation, Cross-Domain Classification
PDF Full Text Request
Related items