Font Size: a A A

Research Of Salient Object Detection On RGB-D Images

Posted on:2021-08-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:F F LiangFull Text:PDF
GTID:1488306470964899Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of artificial intelligence technology,visual saliency detection technology has gradually become an important application research in the era of intelligent vision.This technology plays an important role in solving tasks such as assistant automatic driving and video image monitoring.Although people apply visual saliency to the selection of important information of images,it is a difficult problem to make computers have the same ability as human binocular vision to process images.Based on the theory of biological vision and machine learning,this paper aims to build a rgb-d saliency detection model which is more in line with human visual cognition.The research contents and innovations are as follows: how to effectively extract color and depth multimodal features and how to effectively integrate multimodal features in rgb-d image visual saliency modeling.The research content and innovation are as follows:1.A method based on the prior guidance of contrast and depth background is proposed to calculate the priority of features in the visual saliency of rgb-d image from bottom to top.In this method,an overall framework is used to calculate the influence of color and depth mode on the bottom-up saliency factors.Not only the background prior knowledge is constructed on the disparity map to calculate the depth channel saliency map,but also the contrast of the color channel calculation is given a certain priority to achieve the integration of the two saliency maps.Through the comparative experiment on the open data set,it is proved that the framework can get better visual significance detection effect of rgb-d image.2.A method of calculating visual saliency of rgb-d image based on metric learning is proposed to solve the problem of scene adaptability.In this method,the two-way convolution network is used to extract the color and depth mode features respectively and project them into a high-dimensional measurement space.A new multi-modal measurement loss term is introduced into the cross-entropy loss function to guide the learning of the multi-modal discriminant characteristics of significant and non significant targets to improve the generalization performance of the model.Experiments on open datasets show that the framework can effectively learn high-level attribute features,which improves the generalization ability of the significance model.3.A method of visual saliency calculation of rgb-d image based on depth fusion two-way convolution neural network is proposed to solve the multi-level feature fusion problem of color and depth channel calculation.Firstly,the two-way convolution network is used to realize the hierarchical feature extraction of color and depth modes,and then multiple convolution layers are used to fuse the features of two modes at different resolutions,so as to realize the hierarchical feature extraction of multi-mode and multi-resolution and the adaptive feature fusion.In addition,the long and shortterm memory network is used to model the spatial content dependence of multi-modal and multi-scale content.Experimental results show that the framework has good robustness to different scenes.In conclusion,the dissertation explores the fusion of multimodal features in the visual saliency modeling of rgb-d image at different levels.Through a large number of experiments,the effectiveness of different fusion methods for rgb-d image is proved.The research work of this dissertation lays a foundation for the development of the saliency calculation model of stereo vision.
Keywords/Search Tags:Rgb-d image, visual saliency, prior modeling, multimodal feature fusion, content dependence
PDF Full Text Request
Related items