Font Size: a A A

Research On Salient Object Detection For Images And Videos

Posted on:2020-11-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Z JiFull Text:PDF
GTID:1368330614450816Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer vison,salient object detection,as a vision perceptron task for simulating visual attention mechanism,has attracted more researchers in recent years.Human visual attention system intends to extract the most informative objects and regions in an image naturally,and then combines these local information to efficiently understand the whole scene.This kind of visual attention mechanism has prompted many researchers to stimulate this ability in computer vision tasks.Salient object detection aims at finding the most attractive object(s)in a scene in order to simulate the functionality of biological visual attention system.Therefore,the goal of this dissertation is to develop algorithms or models in order to simulate the attention mechanism of human vision system such that salient object detection can facility high efficiency in emphasizing the import local information.It is of great significance to improve the algorithms and models to promote the research of visual attention mechanism by analyzing the issues of bottom-up and top-down methods based on the exogenous and endogenous of visual saliency.Moreover,according to the different traits of images and videos data,this thesis developed a series of improved algorithm and novel models for salient object detection from the perspectives on the scheme of unsupervised and supervised methods,model generalization capability transferring,as well as the spatial-temporal information modeling for video data.Some competitive research results were achieved.Specifically,the major research innovations and contributions of this dissertation are as follows:(1)To reduce the sensitivity of traditional bottom-up method based on single saliency prior,which may produce more failure cases in capturing salient objects under complex background,a bottom-up method based on graph manifold ranking algorithm for salient object detection by both considering objectness and multiple saliency priors is proposed.Specifically,the algorithm proposes to utilize geodesic distance between any two superpixels to construct the affinity matrix and un-normalized Laplacian matrix of the graph.Then,a saliency optimization method is exploited to refine each saliency map generated by manifold ranking with respect to the first-stage query,and a multilayer cellular automata algorithm is applied to integrate saliency maps corresponding to different features in the final stage.Complexity analysis and experimental results on ablation study with respect to each component,steps and baseline methods,have demonstrated the efficiencyof the proposed method.Moreover,we also performed cross dataset validation on several benchmark datasets in comparison with both state-of-the-art unsupervised and deep learning based methods to demonstrate the capability of the proposed method from the perspective of application limitation and algorithm bottleneck.(2)By considering the limitation of unsupervised method,as well as the lack of research on salient object detection by using generative adversarial network,we propose to conduct salient object detection by exploiting conditional adversarial network for both image-to-saliency and saliency-to-image translation.Specifically,the proposed model is inspired by the application of GAN model in image translation task,in which Wasserstein distance and L1-norm loss are introduced to improve the stability of the training process of a c GAN model,and saliency map prediction is transformed as a saliency segmentation task by using pair-wised image-to-ground-truth saliency samples.Moreover,to further investigate the potential and feasibility of c GAN for saliency detection,we also train the c GAN model to capture saliency-to-context information by reverting the translation direction from saliency mask to real image.Experimental results demonstrate that the efficiency of the proposed model by performing ablation analysis,and the trained generator can achieve comparable state-of-the-art performance on saliency segmentation.Extension study on saliency-to-context translation also suggest that the proposed model can produce reasonable results for mining the semantic concurrence between a salient object and its related image context.(3)By considering the lack of local detail preservation and context information summarization,the prediction accuracy of the saliency map can be degraded by the loss of detailed information including edges,corners and boundary etc.,in a fully convolutional network(FCN)based salient object detection model.Therefore,we propose a novel deep convolutional neural network(CNN)by introducing a spatial and channel-wise attention layer into a multi-scale encoder-decoder framework.Specifically,in the proposed model,the attention CNN layer can align the context information between the feature maps at different scales.In addition,a structure with multiple scale side-way outputs was designed to produce more accurate edge-preserving saliency maps by integrating saliency maps at different scales under the deeply supervised architecture.Experimental results demonstrated the effectiveness of the proposed model on several benchmark datasets.Ablation study on validating the efficiency of the proposed modules in preserving local details,as well as fusing multi-scale context information for improving the performance.Extension study also demonstrated the potential and feasibility of applying our trained model to other object-driven vision tasks as an efficient preprocessing step.(4)To balancing the trade-off between the model accuracy and size with respect to recurrent neural network(RNN)based model,as well as the model complexity of multibranch network for modeling motion information,we presents a novel cross-attention based encoder-decoder model under the Siamese framework(CASNet)for video salient object detection.Specifically,a baseline encoder-decoder model is adopted as a backbone network by transferring the generalization ability for intra-frame salient object detection.Moreover,Self-and cross-attention modules are incorporated into our model in order to preserve the saliency correlation and improve the intra-frame salient detection consistency.The above modules are embedded into a Siamese framework in order to preserve the short-term spatial-temporal saliency correlation and augment the consistency of salient detection between two adjacent video frames simultaneously.Extensive experimental results obtained by cross-dataset validation demonstrate the high performance of our proposed method in comparison with state-of-the-art methods.Quantitative results on ablation analysis indicate the efficiency of proposed modules,as well as the effectiveness and feasibility in transferring image-based model for video salient object detection.
Keywords/Search Tags:Salient Object Detection, Graph-based Manifold Ranking, Convolutional Neural Networks, Generative Adversarial Network, Encoder-decoder Models, Attention mechanism
PDF Full Text Request
Related items