Research On Indoor Scene Segmentation Technology Based On RGB-D Image

Posted on:2024-09-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Jin

Full Text:PDF

GTID:2568307058955999

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As the evolution of technologies such as artificial intelligence,computer hardware and so on,the field of computer vision is increasingly employing deep learning to address associated issues.Image semantic segmentation,a fundamental element in computer vision,has been utilized in a multitude of areas,such as intelligent transportation and medical diagnosis,to name but a few,so the importance of research into image semantic segmentation is immense.The structure of the indoor scene is relatively complex,and RGB images only provide color information,the situation will blur the boundaries between objects which have similar colors.While depth image can provide corresponding geometric relationship for RGB image and retain spatial information of object.Combined with depth image,segmentation effect can be effectively improved.Therefore,semantic segmentation based on supplementary RGB and depth image has gradually emerged as a popular research direction in the issue of image processing.The focus of this paper is the investigation of semantic segmentation technology for indoor scenes,utilizing RGB-D images.The main research components are as follows:(1)Study the relevant theoretical knowledge of deep learning applied in RGB-D image segmentation,analyze the current problems that existing in multi-modal fusion as well as multi-scale fusion for RGB-D semantic segmentation,and carry out research on the above problems;(2)The differences and complementary nature between RGB and depth images are a cause for concern,an attention-guided multi-mode cross fusion segmentation network(ACFNet)is proposed to effectively integrate the two modes.Firstly,the network adopts the encod-decoder structure,and an asymmetric dual-stream feature extraction network was designed,while a global-local feature extraction module(GL)was added to RGB encoder;Secondly,in order to effectively fuse RGB and depth features,an attention-directed multi-modal cross fusion module(ACFM)is proposed to better utilize the enhanced feature representation of fusion in multiple stages.Experimentation has demonstrated that the ACFNet network significantly improves the segmentation outcomes of indoor scenes;(3)An RGB-D Semantic segmentation network(EMFNet)that fuses multi-scale features which are extracted from encoder is proposed,the aim is to tackle the issue of varying dimensions of target items in indoor settings.First and foremost,a PMFM,incorporating pooling operation,is proposed to capitalize on the multi-scale features acquired during the encoder stage;Secondly,a multiple skip connection module(MSCM)is designed,this allows reusing the details which lost during the process of down-sampling.Experimentation has demonstrated that EMFNet surpasses ACFNet and the other semantic segmentation techniques when compared.

Keywords/Search Tags:

RGB-D image, Deep learning, Indoor semantic segmentation, Multi-modal fusion, Multi-scale fusion

PDF Full Text Request

Related items

1	Research On Image Semantic Segmentation Based On Multi-modal And Multi-scale Feature Fusion
2	Research On Semantic Segmentation Method Of Indoor Image Based On Multi-modal Feature Fusion
3	Research On Visual Perception Technology Based On Multi-modal Fusion
4	Research On Multi-Scale Fusion Cross Modal Retrieval Based On Deep Learning
5	Research On Semantic Segmentation-Oriented Attention Mechanism And Multi-Scale Feature Cross-Layer Fusion
6	Multi-modal Image Reconstruction And Fusion
7	Research On Lightweight Real-time Semantic Segmentation Algorithm Based On Deep Learning
8	Research On Image Semantic Segmentation Method Based On Deep Learning
9	Construction And Research Of Multi-Modal Fusion Semantic SLAM System In Indoor Environment
10	Image Semantic Segmentation Based On Feature Spatial Attention Multi-model Fusion