Font Size: a A A

Research On Cross-Distribution Based Image Representation Learning And Its Application

Posted on:2022-09-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:D P DuFull Text:PDF
GTID:1488306725471084Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image representation learning allows machines to automatically learn useful features from images and complete specific tasks.It is an important technique to support various applications such as autopilot,video surveillance,intelligent medicine and so on.With the vigorous development of deep learning technology in the field of computer vision,it has greatly promoted the research and development of image representation technology.However,in real tasks,due to various conditions on image capture equipment,capture environment and capture mode,images with distribution diversity and huge distribution gaps are often found in specific scenarios,which poses various challenges to the model training.Thus flexible and diverse image feature learning techniques are becoming essential.According to the characteristics of each data source and its use in different scenarios,this thesis studies the feature learning of multi-distributed images from the following aspects:first,in distributed deep learning scenarios,training data and test data may come from different distributions,and training data comes from multiple domains,which also exist distribution gaps.Since the deep learning model has the characteristics of overfitting the training set,training the model with multi-domain images mixed without extra processing will lead to "catastrophic forgetting",due to the number and characteristics of different image sets,that could seriously affect the generalization ability of the model.In addition,the diversified development of image capture equipment enables us to obtain multi-modal images for the same scene and explain the same visual target with different data distribution characteristics,so that the model can learn features from multiple angles.For example,in RGB-D applications depth images have the advantages of being insensitive to the illumination changes and having clearer geometric features,which can complement the RGB data well.When using multimodal images for model training,if the corretion between modalities is ignored,the suboptimal solution would be obtained.At the same time,the modality-specific data is sometimes only available during training,and can not be used in the test due to the scenario limitation;this kind of data is called privilege information.How to transfer the effective features of privilege information to the feature learning of the target image and enhance the feature expression ability of the model are worthy of exploration.To handle the different challenges under multiple data distribution,we need flexible and diverse image feature learning techniques.This thesis puts forward the targeted feature learning technology,and the main work and contribution are summarized as follows:1.Aiming at the challenge that the training images come from multiple domains and the test domain has unknown data distribution,this thesis proposes a domain generalization feature learning technique based on cross-domain gating mechanism.Based on the "information bottleneck" theory in information theory,we filter redundant domain-specific information in the process of model training.We leverage the characteristics of convolution neural network that it has different ability of expressing single domain and cross domain features,and use cross-domain activation to locate and eliminate the redundant features for label prediction,and activate more effective features in the training.This proposed method alleviates the problem of overfitting a specific distribution for multi-domain training,enhances its robustness to the representation of unknown domain data,thus improves the generalization ability of the model.In addition,in order to maintain the balance between information culling and feature diversity in training,we propose a hierarchical model parameter updating strategy to make the training more stable.We perform comprehensive experiments on three domain generalization datasets,and the experimental results are superior to state-of-the-art methods.At the same time,a series of experiments also verify the effectiveness of cross-domain gated feature learning in modeling the domain shift problem.2.Aiming at the characteristics of multimodal images with large data distribution and strong correlation,this thesis proposes a feature learning technology based on crossmodal feature enhancement.The exploration of multimodal complementary information is modeled in the optimization of cross-modal image translation model,and the expression ability of the network is enhanced by using the shared feature space.In the process of image translation,we design an image generation function with semantic constraints,and introduce image labels to enhance the model's understanding of image content.At the same time,the model can generate complementary modal images of high quality,and amplify the training data,which effectively enhances the feature learning.Based on the scene recognition task,we carry out comprehensive experiments and comparative analysis on two RGB-D indoor scene recognition datasets.The experimental results show that our method can effectively utilize the correlation between multimodal data and enhance the characterization ability of multimodal network,which exceeds state-of-the-art methods.3.Aiming at image representation learning in privileged information scenarios with large difference of data distribution,this thesis proposes an image feature enhancement technology based on contrastive learning.By maximizing the mutual information between the privilege information and the target image,we effectively transfer the differentiated features of the privileged information to the feature learning of the target image,and enhance the ability of the model to represent the target image.During the training,we pretrain the model using the image generation task based on a GAN network to obtain a better initialization representation of the features.At the same time,we use image patches instead of the whole image,to strengthen the sampling process of contrastive learning,and improve the effect of feature learning.The experimental results show that our method can effectively use privilege information to enhance the feature expression ability of the model and obtain better results than the existing methods.
Keywords/Search Tags:Cross-distributed images, Image representation learning, Privilege information, Cross-modal feature learning, Domain generalization, Transfer learning
PDF Full Text Request
Related items