Font Size: a A A

Semantic Segmentation For Indoor Scenes With RGB-d Images

Posted on:2016-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:X L FengFull Text:PDF
GTID:2308330473460860Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As one of the cores of image scene understanding, semantic parsing has become a research hotspot in the fields of image processing and computer vision. Semantic labeling of indoor scenes with large variation of semantic categories, occlusions between multiple objects, being lack of distinctive features and illumination changes, has become one of challenging research directions in image understanding. Nowadays, In view of widely available depth sensors and continuous efforts of numerous computer vision researchers, multi-modal RGB-D datasets with RGB texture and depth information can be more easily acquired than any other previous time. The perceived RGB-D data with richer depth information can be integrated to solve semantic labeling problem of indoor scenes with great potential and prospect in visual computation. This paper presents a coarse-to-fine semantic parsing of RGB-D indoor scenes with two sub-parts, such as coarse-grained region-level semantic label inference and fine-grained pixel-level semantic tag adjustment. First, in the former part, we construct the superpixel region pools for labeling semantic tags using hierarchical saliency-guided simple linear iterative clustering(SLIC) segmentation and multi-modal regional features to classify superpixel regions with the semantic labels by the learned discriminative random decision forests model. Then, we improve a depth-guided pixel-wise fully-connected conditional random fields(CRFs) model with an internal recursive feedback to update fine-grained semantic label in the latter pixel-level part. Final, progressive global recursive feedback mechanism is introduced to iteratively update semantic tags of the predefined superpixel region in the scenes.To resolve the traditional fast unsupervised segmentation problem of being hard to produce superpixel regions with consistent edges in cluttered indoor scenes, we propose a hierarchical saliency-guided SLIC segmentation to implement coarse-grained region-level semantic label inference, while hierarchical saliency weakening the influence of small-scale salient patterns. First, extract the hierarchical structures from three scale hierarchies of RGB color image of indoor scene, calculate the related saliency of each hierarchy and integrate them into a unified hierarchical saliency image to construct tree-based inference model. Then, select the hierarchical saliency cue and obtained depth information to extent the clustering feature space in the SLIC segmentation with three color components to obtain compact superpixels of indoor scenes. And then, several kinds of visual features in the predefined region, such as the mass center, the average of HSV components, the HSV histogram, histograms of oriented gradients(HOG) based on RGB image, HOG based on depth image and HOG based on normal map of superpixel regions, are calculated and normalized to be one unit vector, respectively. And and then, concatenate them to obtain multi-modal visual features of the given superpixel region. Finally, classify the superpixel semantic label with the learned random decision forests and multi-modal regional features. Experiments on NYU Depth v2 and SUN3 D datasets indicate that, the modified SLIC segmentation method can reduce the influence of small-scale salient patterns for superpixel region generation to improve the performance of region boundary and produce more consistent semantic label with high accuracy, while providing more reliable context information to refine the fine-grained semantic label.In traditional indoor scene semantic segmentation method without enough consideration of contributing context inference with geometric depth information, we apply geometric depth information and internal recursive feedback mechanism into pixel-wise fully-connected CRFs model to improve the performance of fine-grained semantic tag refinement. First, exploit richer depth information of the given scene and intrinsic camera parameters to calculate surface normal vectors from the related 3D point clouds. Then, modify the pair-wise potential with depth map and related surface normal map in the pixel-wise fully-connected CRFs model to infer the initial fine-grained semantic tags. Finally, iteratively update the coarse-grained semantic labels in terms of reasonable stop criterions. If meet the defined criterions, terminate the designed feedback process to let the obtained semantic annotation image as final fine-grained semantic label map. If not, go back to the previous step. Experiments on NYU Depth v2 and SUN3 D datasets demonstrate that, the modified fully-connected CRFs inference can reduce the influence of over-exposursing or non-uniform indoor illumination comparing with traditional fully-connected CRFs model. Meanwhile, the presented internal recursive feedback mechanism improves the accuracy and stability of fine-grained pixel-level semantic annotation refinement to obtain better visual effect of fine-grained semantic annotation with high accuracy.Since traditional semantic segmentation approaches of indoor scene are hard to choose suitable scales of annotation element region, we propose a coarse-to-fine semantic parsing framework of RGB-D indoor scene with a progressive global recursive feedback mechanism. First, extent the clustering feature space in the SLIC segmentation with the fine-grained semantic annotation map as an additional perception channel. Then, whether the differences between the updated coarse-grained semantic annotation and the previous obtained semantic label are lower than certain threshold or not can be served as the termination condition in the progressive global recursive feedback mechanism. If so, terminate the feedback to obtain the updated final coarse-grained semantic annotation image as final semantic annotation image in the whole semantic segmentation framework. If not, return to the previous step. Different from the traditional single region-level or pixel-level semantic segmentation technique, the proposed framework reasonably introduces the progressive global recursive feedback to establish seamless connection between coarse-grained region-level semantic label inference and fine-grained pixel-level semantic annotation refinement to acquire more consistent semantic labels. Experiments on NYU Depth v2 and SUN3 D datasets indicate that, the presented framework can fuse multi-modal visual information of RGB-D indoor scene image and obtain excellent results than that of some traditional semantic segmentation methods with only single scale of annotation element such as region or pixel level.
Keywords/Search Tags:Indoor scene, Semantic parsing, RGB-D image, SLIC segmentation, Fully-connected CRFs, Recursive feedback mechanism
PDF Full Text Request
Related items