Indoor Object Detection And Semantic Segmentation Based On Depth Estimation And Feature Fusion

Posted on:2021-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:H H Liu

Full Text:PDF

GTID:2518306107488584

Subject:Instrument Science and Technology

Abstract/Summary:

PDF Full Text Request

With the increasing development of image processing technology,object detection and semantic segmentation become the research focus gradually.Since indoor scene is closely related to human life,it has great research significance and practical application prospects for object detection and semantic segmentation in indoor scene.In the research of object detection and semantic segmentation,the combination of color and depth images can integrate the effective information in the scene more comprehensively,which has certain advantages.Therefore,from the perspective of depth estimation and multi-scale feature fusion,this paper conducts related research on object detection and semantic segmentation in indoor scenes.In terms of depth estimation,in order to handle the problem that it is difficult to find accurate matching points in ill-posed areas,a stereo matching method is proposed,which combines multi-scale local features and deep features.Among them,the feature fusion stage mainly includes two parts,one is to extract the shallow features which combine Log-Gabor feature and local binary pattern feature,and then fuse the features at different scales;the other is to extract the deep features by convolution neural network,and then cascade the multi-scale shallow fusion features and deep features to form the features that contain both semantic and structured information.In addition,the positive and negative samples are constructed by adding noise of different intensities in the vertical direction of the epipolar line,so as to reduce the error caused by the imprecise epipolar line alignment in the image.Comparing the proposed binocular stereo matching model with the variant and the classical methods on the KITTI dataset,it is verified that the proposed method performs better in image detail,and is competitive compared to other methods.In terms of object detection by jointing color and depth image,in order to alleviate the problem of insufficient feature expression and large difference of object scale in indoor scene,an object detection method combining attention mechanism and feature fusion is proposed.Firstly,the correlation features of color and depth images are obtained by pixel level fusion adopting non-subsampling contourlet transform,which makes the regional features in the image more obvious,and provides the adaptability of convolutional neural network to rotation invariance.In order to alleviate the lack of spatial information in the feature extraction of color image,the dual-stream convolution neural network is adopted to extract the features of color and correlation features respectively,and the nonlinear fusion of multi-layer perceptron is performed at corresponding layer.In order to alleviate the problem of large scale difference in the image,the attention mechanism is adopted to selectively fuse the features of different layers to obtain the features with specific representation.The proposed object detection method is experimented on NYUDv2 dataset,and compared with the existing classical methods,which verifies the rationality and effectiveness of the proposed method,and has a certain performance improvement.In terms of semantic segmentation by jointing color and depth images,in order to alleviate the problems of illumination changes,mutual occlusion of objects and a large number of semantic categories in indoor scenes,a semantic segmentation method based on dual-stream Gabor convolutional network fusion is proposed.In order to obtain direction-invariant and scale-invariant features,a weighted Gabor directional filter is designed to replace the traditional convolution filter,and then the features that are beneficial to semantic segmentation are extracted.In addition,in order to construct a lightweight feature extraction network,a wide residual block is adopted to extract color and depth image features respectively,and then the extracted features are multi-scale fused through pyramid pooling module to obtain image context information.The proposed semantic segmentation method is experimented on NYUDv2 dataset,then compared with the existing classical methods and variant methods,which verifies that the proposed method has a certain performance improvement and the setting of each module is reasonable.

Keywords/Search Tags:

Indoor Scene, Binocular Stereo Matching, Multi-scale Feature Fusion, Object Detection, Semantic Segmentation

PDF Full Text Request

Related items

1	Research On Binocular Stereo Vision Based On Multi-scale And Convolutional Neural Network
2	Research On 3D Scene Reconstruction From Binocular Stereo Pairs
3	Research Of Efficient Semantic Segmentation Methods For Scene Perception
4	Research On End-To-End Binocular Stereo Matching Algorithm Based On Deep Learning
5	Research On Object Detection Algorithm Based On Multi-Scale Semantic Information Fusion
6	Object Location Of Binocular Stereo Vision Base On Multi-scale Feature
7	A Binocular Stereo Matching Approach Based On Dual Fusion
8	Research On Text Detection Technology In Natural Scene Images Based On Multi-Scale Feature Fusion And Instance Segmentation
9	Research On Moving Object Detection And Tracking With Stereo Vision Technology In Static Scene
10	A Semantic Segmentation Algorithm Using Multi-scale Feature Fusion With Combination Of Superpixel Segmentation