Font Size: a A A

Research On Semantic Segmentation Method Of Indoor Image Based On Multi-modal Feature Fusion

Posted on:2022-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:W WuFull Text:PDF
GTID:2518306539491974Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In a complex indoor environment,there are uneven sun rays,many objects with different sizes,high color and texture similarity between objects,and mutual occlusion so that the RGB image-based segmentation methods had poor abilities for recongnizing object's boundary and dividing category,and got lower accuracy.However,The semantic segmentation method that combines RGB images and depth images that carry spatial geometric information uses the position and level information of the object to supplement the color features,thereby improving the performance of semantic segmentation.The information contained in RGB images and depth images has internal varidifferences.Therefore,it is very challenging to explore an effective method for extracting and fusing RGB-D multi-modal features.Firstly,a Multi-modal Feature fusion Convolutional Neural Network(MFCNN)model is proposed for the indoor's semantic segmentation of RGB-D images.The model has two parts: encoding and decoding.In the encoding stage,a multi-modal feature encoding structure is constructed,the RGB image and the depth image of the RGB-D image are taken as two independent modalities to extract the features,and the RGB and Depth modal features are fused layer by layer on the RGB-D fusion modal.By separating the extraction and fusion operations of each modal feature,the network can obtain more feature information.In the decoding stage,the deconvolution operation is used for up-sampling.The data obtained by continuous up-sampling of RGB-D fusion modal features and multiple up-sampling of decoded features make cross-layer multi-modal connection,more contextual feature information can be got.Utilizing the multi-modal information of different scale receptive fields in each level improves the robustness of target recognition and improves the accuracy of model segmentation.Secondly,For the case where there are classification errors between classes in the semantic segmentation map of the MFCNN model,a double attention mechanism and multi-modal feature fusion deep neural network(DAM-MFDNN)model is proposed for researching the relationship of RGB-D multi-modal image features.Use the multi-modal supplementary attention module to calculate the correlation between the features of each channel of the RGB-D image,assign larger weights to important feature channels,and supplement feature information through the fusion of multi-channel features to strengthen the prominent expression of features,and obtain high-quality feature information.The multi-modal global attention module calculates the dependence of global semantic information on deep RGB features and Depth features,strengthens the ability of features to express semantic information and model discrimination,so as to obtain more accurate feature maps.Finally,the validity of the above model is verified on NYU Depth V2 and SUN RGB-D datasets,and compared with the more popular semantic segmentation methods at present,The model proposed in this paper has high semantic segmentation accuracy and has certain advantages in objective evaluation.
Keywords/Search Tags:deep learning, multi-modal feature, semantic segmentation, RGB-D image
PDF Full Text Request
Related items