Font Size: a A A

Indoor Object Detection And Semantic Segmentation Based On Depth Estimation And Feature Fusion

Posted on:2021-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:H H LiuFull Text:PDF
GTID:2518306107488584Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing development of image processing technology,object detection and semantic segmentation become the research focus gradually.Since indoor scene is closely related to human life,it has great research significance and practical application prospects for object detection and semantic segmentation in indoor scene.In the research of object detection and semantic segmentation,the combination of color and depth images can integrate the effective information in the scene more comprehensively,which has certain advantages.Therefore,from the perspective of depth estimation and multi-scale feature fusion,this paper conducts related research on object detection and semantic segmentation in indoor scenes.In terms of depth estimation,in order to handle the problem that it is difficult to find accurate matching points in ill-posed areas,a stereo matching method is proposed,which combines multi-scale local features and deep features.Among them,the feature fusion stage mainly includes two parts,one is to extract the shallow features which combine Log-Gabor feature and local binary pattern feature,and then fuse the features at different scales;the other is to extract the deep features by convolution neural network,and then cascade the multi-scale shallow fusion features and deep features to form the features that contain both semantic and structured information.In addition,the positive and negative samples are constructed by adding noise of different intensities in the vertical direction of the epipolar line,so as to reduce the error caused by the imprecise epipolar line alignment in the image.Comparing the proposed binocular stereo matching model with the variant and the classical methods on the KITTI dataset,it is verified that the proposed method performs better in image detail,and is competitive compared to other methods.In terms of object detection by jointing color and depth image,in order to alleviate the problem of insufficient feature expression and large difference of object scale in indoor scene,an object detection method combining attention mechanism and feature fusion is proposed.Firstly,the correlation features of color and depth images are obtained by pixel level fusion adopting non-subsampling contourlet transform,which makes the regional features in the image more obvious,and provides the adaptability of convolutional neural network to rotation invariance.In order to alleviate the lack of spatial information in the feature extraction of color image,the dual-stream convolution neural network is adopted to extract the features of color and correlation features respectively,and the nonlinear fusion of multi-layer perceptron is performed at corresponding layer.In order to alleviate the problem of large scale difference in the image,the attention mechanism is adopted to selectively fuse the features of different layers to obtain the features with specific representation.The proposed object detection method is experimented on NYUDv2 dataset,and compared with the existing classical methods,which verifies the rationality and effectiveness of the proposed method,and has a certain performance improvement.In terms of semantic segmentation by jointing color and depth images,in order to alleviate the problems of illumination changes,mutual occlusion of objects and a large number of semantic categories in indoor scenes,a semantic segmentation method based on dual-stream Gabor convolutional network fusion is proposed.In order to obtain direction-invariant and scale-invariant features,a weighted Gabor directional filter is designed to replace the traditional convolution filter,and then the features that are beneficial to semantic segmentation are extracted.In addition,in order to construct a lightweight feature extraction network,a wide residual block is adopted to extract color and depth image features respectively,and then the extracted features are multi-scale fused through pyramid pooling module to obtain image context information.The proposed semantic segmentation method is experimented on NYUDv2 dataset,then compared with the existing classical methods and variant methods,which verifies that the proposed method has a certain performance improvement and the setting of each module is reasonable.
Keywords/Search Tags:Indoor Scene, Binocular Stereo Matching, Multi-scale Feature Fusion, Object Detection, Semantic Segmentation
PDF Full Text Request
Related items