Font Size: a A A

Research On Indoor Scene Parsing For Multi-modal RGB-D Images

Posted on:2020-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:L X HangFull Text:PDF
GTID:2428330590458249Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the popularity of depth sensors such as Kinect and the extensive application of deep learning in the field of computer vision,multi-modal imagery based indoor scene analysis has made great progress.However,there are still deficiencies in practical applications.On the one hand,most algorithms rely on RGB-D images during both training and testing phases,but due to the intrusion of color images on user privacy and the limitations of optical imaging that fails in dark environments,it is more realistic to research on algorithms that rely only on depth images during the testing phase.On the other hand,the deep-learning-based algorithms rely heavily on manually labeled training samples.In order to reduce the manpower and financial cost of re-labeling samples in unfamiliar scenes,it is necessary to study unsupervised algorithms that can automatically generate segmentation results.In view of the above problems,this paper studies the indoor scene analysis algorithm based on multimodal imagery.For the deep-learning-based algorithms that only use depth images in the test phase,we propose an indoor scene parsing algorithm of depth images based on dual encoder.Based on the model framework of "encoder-decoder",the dual encoder is used to extract features from two aspects,namely the two-dimensional depth image and the three-dimensional point cloud respectively.The fused feature maps are more expressive,which are sent to the decoder for prediction.The post-processing using he higher-order CRF module guarantees the consistency of the category labels of the parsing results.In order to effectively utilize color images to assist feature extraction from depth image in the training phase,we propose a modality knowledge distillation algorithm using color images to guide depth images.Based on the two-stream network and the framework of “privileged information learning”,the model that takes in color images is trained as teacher model.Then the feature extraction ability of the student model that takes in depth images is improved by the modality knowledge distillation loss function,which result in a final model that simulates feature extraction from color images by taking in depth images.Thus,the knowledge distillation between different modalities is realized.Aiming at the problem of automatic segmentation of indoor scenes in an unsupervised manner,we propose a co-segmentation algorithm based on RGB-D fusion modality information.The algorithm utilizes the complementary information of depth images and color images,which largely solves the problem of semantic confusion caused by color images of indoor scenes.The algorithm is based on the clustering of superpixels of the input images step by step.Using the proposed bounding plane prior,the algorithm can distinguish the foreground and background of the input image easily and effectively.Based on the two-stage object hypothesis filtering mechanism of the based on foreground object point cloud objectness metric,it can further distinguish different foreground objects under occlusion conditions better.
Keywords/Search Tags:Indoor Scene Parsing, Image Segmentation, Multi-modal Information Fusion, Unsupervised Image Segmentation
PDF Full Text Request
Related items