Font Size: a A A

Research On Unsupervised Domain Adaptation In Deep Learning

Posted on:2023-02-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y M YinFull Text:PDF
GTID:1528306836477494Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In the past decade,intelligent signal and information processing based on deep learning has developed rapidly,and human beings have entered the era of intelligence ahead of schedule.Information classification is one of the most basic and commonly used applications in intelligent information processing.At the same time,the generalization ability of learned models faces serious challenges.Although the deep-learning-based classification model has achieved an accuracy comparable to manual classification on many standardized datasets,actual scenarios are significantly different from those used during model training,especially in the scenarios where data labels cannot be obtained in advance.Domain adaptation(DA)utilizes the data in the application field to realize adaptive model adjustment,effectively overcomes the defect that the deep learning model relies on the distribution of training data,and at the same time meets the complexity and variability requirements of the application field.So,DA has become one of the research hotspots recently.This dissertation starts from the DA problem based on deep learning and faces two popular areas—image classification and molecule screening,and analyzes the shortcomings of deep learning models in generalization.The model’s generalization performance in these applications is improved by researching and utilizing new deep learning theories,methods,and technical means such as metric learning,pseudo-margin,and screening regression.The main contributions and innovations of this dissertation are as follows:(1)Aiming at the problem of easy misclassification in image classification,the relationships of image classification errors between the source domain and the target domain are analyzed during the domain alignment of feature distributions.In this dissertation,combined with the metric learning,the error of image classification in the target domain after domain alignment is further controlled by introducing a triple loss with dynamic margins in the source domain.Then,a new DA algorithm is proposed: the metric learning-assisted domain adaptation(MLA-DA)algorithm.MLA-DA increases the classification margin in the source domain,so the classification boundary is more spacious for the target domain image after domain alignment.Theoretical and experimental results show that MLADA achieves better robustness and generalization performance on target domain image classification than the classical domain alignment algorithm.(2).For the Universal Domain Adaptation(UDA)problem in image classification,the output probability distribution of the source classifier in the target domain image classification is analyzed.The relationship between these probability distributions and the common label set recognition is explored.Then,a general domain adaptation method using pseudo-margin(PM)is proposed to accurately identify the common label set within the label set of the source domain.For practical applications,the label set of the target domain is completely unknown.A probabilistic model is proposed to represent the possibility of source domain image labels appearing in target domain images,and PM vectors are constructed.Finally,through class-level weighted adversarial training based on PM vectors,the feature distributions of image samples in the common label set are aligned as much as possible.The experimental results show that the UDA theory of image classification based on probability models and pseudo-margins can accurately identify images of common label sets and achieve better performance in unknown target domains.(3).To solve the contradiction between diversity and difference in multi-source image classification,a technique of joint domain alignment is proposed to simultaneously align the common label set’s distribution of multiple source domain images,source domain images and target domain images,respectively.This dissertation pioneered the Universal Multi-Sources Domain Adaptation(UMDA)theory.On this basis,a Universal Multi-Sources Adaptation Network(UMAN)is designed to further improve the performance of UMDA problems.Based on UDA research based on PM,UMAN proposes a joint multi-source domain image classifier and adversarial loss,which effectively eliminates the distribution difference between every two domains and significantly reduces the system’s complexity.Theoretical and experimental results show that employing a joint multi-source domain image classifier and a joint domain alignment loss function can improve the generalization and robustness of deep learning models,especially in complex scenarios with numerous source domains and large differences.(4)The above research explores the reference provided by the continuous value expression of the label relationship for the unlabeled target domain adaptation.This dissertation further generalizes the discrete classification problem into a continuous regression problem.It is applied to drug virtual screening(VS)to further verify the potential of unlabeled target domain adaptation to improve the generalization performance of regression models based on deep learning.First,given the distribution difference between the measured and unmeasured molecular data,a new standard data set for molecular virtual screening is constructed according to the prior distribution of the actual database to be screened,aiming at comprehensively evaluating the performance of deep learning models in real VS.On this basis,a new model of Real VS(Real VS)is proposed.Real VS transfers rich source domain information from the data of other related targets and adapts the transfer information by using domain adaptation theory to reduce the impact of the inconsistent distribution of training and test data on the generalization performance of the model and its classification and filtering capabilities on the target.In addition,a graph attention mechanism is employed to study the interpretability of the classification screening results of the Real VS model.The experimental results show that compared with the commonly used deep learning methods,the proposed Real VS model significantly improves the classification and screening performance,and achieves strong scalability and robustness.(5)The above study found that the problem of small sample sets in image classification is more obvious in virtual drug screening,which makes deep learning models easily overfit to training data,thereby making it difficult to generalize to unlabeled target domains.In response to this problem,a new Adversarial Feature Subspace Enhancement(AFSE)technology based on virtual adversarial training(VAT)is proposed to further enhance the generalization performance of deep learning models under small sample sets.Specifically,VAT is performed in the feature subspace to achieve higher feature smoothness while retaining the model’s ability to represent the activity value cliff,thereby improving the generalization of the deep learning model on the target task under the condition of small sample sets.The experimental results show that the AFSE method can be applied to a variety of commonly used graph neural networks,and has achieved multiple performance improvements on a large number of systematically constructed drug VS datasets,including the proportion of hit highactivity molecules,the accuracy of predicting molecular activity,and the matching degree of molecules sorted by activity.The dissertation concludes with a summary of the full text and an outlook on future research on deep learning-based unlabeled target domain adaptation.
Keywords/Search Tags:Deep learning, domain adaptation, universal multi-source domain adaptation, image classification, metric learning, pseudo-margin, drug virtual screening
PDF Full Text Request
Related items