Font Size: a A A

Deep Learning Based RGB-D Semantic Segmentation Method

Posted on:2022-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:H T ChenFull Text:PDF
GTID:2518306743983289Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development of computer technology,robots are used to assist human activities and can even independently identify the types of objects in the working environment.In recent years,it has become one of the goals of image semantic segmentation to make robots have the ability to understand images in indoor scenes.With the continuous progress of hardware capability and the deepening of neural network research,the ability of RGB-D segmentation has been superior to the ability of processing RGB images alone.Therefore,accurate semantic segmentation of interior scenes is now possible with the help of depth maps.In this context,this paper discusses the development and key points of semantic segmentation and convolutional neural network,analyzes the design of full convolutional neural network model under semantic segmentation task,as well as the fusion technology of RGB information and depth information,and proposes two RGBD semantic segmentation models based on indoor images combined with existing technologies.The first model is an RGB-D semantic segmentation model based on depth perception operator and expansion convolution.Depth sensing operator can reduce data conversion and image information loss by directly introducing geometric information of 2D depth image into network calculation.The expansion convolution enables the model to obtain more receptive fields without subsampling.Through the improvement of Deep Lab on the backbone network,the extracted features contain richer semantic information,thus improving the accuracy of semantic judgment of the model.At the same time,this paper considers a idea of layered enhancement to better play the advantages of RGB image and depth image fusion.Therefore,this model can quickly calculate the results of semantic segmentation and is suitable for many scenarios with high agility.In the final experiment,the MIOU was 38.0% on the Sun RGB-D data set and 35.1% on the NYU Depth V2 data set.The second model is the RGB-D semantic segmentation model based on attention mechanism.In this model,the residual neural network is used as the overall framework,and the end-to-end training neural network of attention mechanism is used to process the semantic information between RGB images and depth images.At the same time,a jump fusion structure is added to combine the information between up-sampling and under-sampling to improve the accuracy of segmentation.In this study,we capture depth features and RGB features that are useful for the target task through the attention mechanism,and significantly improve the model recognition results through feature fusion module and context module.In addition,the ablation experiments demonstrate the effectiveness of the context module,jump link,and feature fusion module with extrusion and excitation on the two datasets.In the final experiment,the MIOU was49.8% on NYU Depth V2 dataset and 48.3% on Sun RGB-D dataset.Compared with similar algorithms,the recognition accuracy of this model is significantly improved,and it is suitable for object recognition in indoor scenes.
Keywords/Search Tags:deep learning, semantic segmentation, RGB-D images, attention mechanism, dilated convolution
PDF Full Text Request
Related items