Font Size: a A A

Research On Indoor Scene Multi-Modal Sensing Based On Active Vision

Posted on:2020-04-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Q FangFull Text:PDF
GTID:1488306344959629Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Multi-modal sensing based on active vision refers to a technique for acquiring multi-modal image data using an active sensor projecting a controllable light source.Compared to the single-mode data,such as color images,multi-modal images provide more information and are more suitable for indoor scenes containing a wide variety of objects.There are still many key technical difficulties in the multi-modal sensing methods for indoor scenes,such as uneven illumination of the scene,object occlusion,high precision requirements,dynamic scenes,and difficulty in data fusion.Based on the analysis and summary of domestic and international research,this dissertation proposes a number of on active vision-based multi-modal sensing tasks,including depth perception,RGB-D image data fusion(superpixel segmentation),albedo estimation,3D reconstruction and object detection using multi-modal data,which realizes active vision-based multi-modal sensing of indoor scenes.The main works and innovations of this dissertation are as follows:Since the active vision-based depth perception in indoor environment suffers from illumination disturbance and occlusion,a structured light depth perception method based on convolutional autoencoder is proposed.The method uses a convolutional autoencoder to remove noise in the laser image of the structured light system,thereby improving the accuracy of the depth measurement.In order to reduce the amount of annotation data,a small-sized image block is defined as the input of the autoencoder,on the basis of which a dataset with small samples for image denoising is established.To denoising images,the created dataset is used to train the deep convolutional autoencoder.This method reduces the external noise of the structured light system and greatly improves the accuracy of depth perception.According to the complex steps of multi-modal data fusion and low efficient feature extraction for high-level visual tasks using RGB-D images,a superpixel segmentation method for RGB-D images is proposed.The method uses a clustering algorithm framework,with color similarity,spatial proximity and geometric similarity between pixels as the clustering criterion.In addition,the RGB-D coplanar feature and content-adaptive weights are introduced to perform high-speed linear iterative.The method can effectively extract multi-modal features from the indoor scene RGB-D images,and generate a full-coverage superpixel segmentation result.As albedo estimation is facing the problem of slow speed,low accuracy,and complicated equipment,a robust estimation method for near-infrared albedo based on the nonstationary stochastic process is proposed.The method calculates the initial albedo from the results of the Kinect V2 sensor and establishes an albedo additive noise model.At the same time,the concept of shading robust estimation is proposed to simplify the nonstationary stochastic process model of albedo.The albedo estimation result is superior to other denoising algorithms,and it is suitable for high-precision estimation of albedo images in indoor scenes.In order to fuse multi-modal data including color,depth,and infrared albedo images(RGB-D-A images),and explore the multi-modal vision research,a 3D reconstruction approach based on multi-modal registration is proposed,and an object detection method using multi-modal feature is suggested.The 3D reconstruction method calibrates the color camera and the infrared camera of acquisition device to register the RGB-D-A image and estimates the color albedo according to the infrared albedo to realize the 3D reconstruction of the color albedo point cloud.The object detection method uses multi-modal feature fusion and sub-superpixels to generate semantic cuboid candidates and detects the semantic cuboids through multi-classifier and sorting algorithm.The method can not only detect the pose of the object in 3D space but also achieve an accurate result of object classification.These methods confirm that the multi-modal data perceived in this dissertation performs a crucial role in computer vision algorithms and estimate the value of multi-modal images in the applications.
Keywords/Search Tags:multi-modal sensing, active vision, depth perception, albedo estimation, object detection
PDF Full Text Request
Related items