Font Size: a A A

Research On Scene Image Recognition And Segmentation Based On Contrastive Learning

Posted on:2022-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2518306725981239Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Scene recognition and semantic segmentation as two important basic tasks in the field of computer vision have been the focus of researchers.Scene recognition needs to accurately identify the semantics of multiple objects in the scene,and consider the relationship between different objects in the scene and the overall environment in order to accurately classify the scene images.Semantic segmentation requires not only accurate understanding of the semantic information of the scene content,but also spatial positioning of each object and even pixel-level information.Therefore,scene recognition and semantic segmentation are very challenging.In recent years,benefiting from the development of large datasets and high performance GPU resources,both scene recognition and semantic segmentation have achieved remarkable results.In the task of combining multi-modal data for scene recognition,researchers mostly use multiple branch networks to train models of different modality separately,and then fuse the features of multiple branches or use modality translation to fuse the multi-modal data.Due to the lack of the corresponding initialization model and the difference of the data distribution among the multi-modal data,such methods can only use part of the valid information of the multi-modal data.In terms of semantic segmentation task,the current mainstream methods all use spatial pyramid structure for semantic segmentation.Although this method can obtain useful multi-scale information,it is difficult to learn important overall structure information.Considering the outstanding performance of contrastive learning in self-supervised learning,the combination of contrastive learning with scene recognition and semantic segmentation can make full use of contrastive learning to improve the ability of backbone network to distinguish relevant samples from irrelevant samples,so as to improve the performance of scene recognition and semantic segmentation.Aiming at the task characteristics of scene recognition and semantic segmentation,this paper designs different contrastive learning frameworks and uses different modality to conduct experiments.Specifically,the main contributions of this paper are as follows:? In the task of scene recognition based on contrastive learning,a self-supervised feature learning framework is designed to train the initialization model of the scene recognition task,and contrastive learning is used to improve the ability of the backbone network to distinguish between relevant samples and irrelevant samples.In this paper,RGB image and Depth image are taken as the input modality and target modality of the model respectively,and the difference of data distribution between RGB image and Depth image is reduced through modality translation.This paper also uses a generative adversarial module to constrain the high-level semantic features of the generated modality and the target modality to further enhance the ability of the backbone network to encode relevant samples.Considering that the scene recognition task pays more attention to high-level semantic information,this paper uses the high-level semantic features extracted from deep convolutional networks to construct positive and negative samples for contrastive learning.?In the task of semantic segmentation based on contrastive learning,this paper combines contrastive learning and semantic segmentation to form a multi-task model.In this paper,the RGB image and the colorized semantic segmentation label are taken as the input modality and the target modality respectively because the colorized semantic segmentation label has clear overall structure information.The modality translation operation is also used to reduce the data distribution difference between the RGB image and colorized semantic segmentation label.Considering the importance of the overall structure information of the scene for semantic segmentation,this paper uses the overall structure features extracted from shallow convolutional networks to construct positive and negative samples for contrastive learning.The two techniques based on contrastive learning proposed in this paper are conducted detailed experiments on the corresponding open datasets respectively.The experimental results show that the two techniques presented in this paper achieve outstanding performance significantly higher than the baseline.
Keywords/Search Tags:scene recognition, semantic segmentation, contrastive learning, multi-modal data, multi-task learning
PDF Full Text Request
Related items