Font Size: a A A

Research On Visual Salient Object Detection Via Graph Fusion Of Spatiotemporal Features

Posted on:2022-01-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Z XuFull Text:PDF
GTID:1488306569984109Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual salient object detection(SOD)is a hot research direction in the field of computer vision.Its objective is to establish a heuristic or learning model by simulating the human visual attention mechanism,which aims to locate the most conspicuous object in a visual scene and highlight them uniformly from the background.Compared to image SOD,video SOD is a more challenging task.This can be attributed that many unconstrained complex visual scenes are exhibited in dynamic videos,such as moving backgrounds,camera shaking,small objects,deformable objects,object occlusion,low contrast between foreground and background,and so on.In these complex dynamic visual scenes,the existing related detection models are still suffer from such problems as complex moving background interference,the poor consistency of salient objects in spatiotemporal domain,the coarse boundaries of salient objects.Therefore,they can't perform the visual SOD task in complex visual scenes well.The key points to improve the performance of visual SOD are the extraction of robust spatiotemporal salient features and the fusion of complementary spatiotemporal features,which can be achieved by exploiting the spatiotemporal context information.Graph-based SOD models have been widely researched for their effectiveness in modeling spatiotemporal context relations.Nevertheless,there are still many challenges in the construction of a robust video SOD model for the complex dynamic visual scenes.The first challenge is how to establish robust spatiotemporal saliency features when facing scenes with complex moving background,and design graph fusion method to take advantages of various features complementarily,so as to effectively filter out the interference of irrelevant complex moving background.The second challenge is how to mine the spatiotemporal constraint information among graph nodes in complex scenes with low contrast between foreground and background,deformable objects,motion blur and so on,and improve the existing graph model theory to integrate the spatiotemporal feature complementarily,so as to improve the continuous consistency of saliency map.The third challenge is how to design a supervised deep graph learning method to mine and aggregate important saliency information over spatiotemporal graph structure data when there are enough training data,so as to improve the ability to retain the fine edges of salient objects.In view of the above challenges,this dissertation studies the extraction and fusion of spatiotemporal feature under the framework of graph model theory,and give a deeper research into the video SOD from different perspectives.Specifically,the main content of this dissertation is divided into the following three aspects:Firstly,aiming at the suppression of complex moving background interference,this dissertation proposes a visual SOD method via graph clustering with motion energy and spatiotemporal objectness,from the perspective of spatiotemporal saliency feature extraction and fusion.To be more precise,this dissertation proposes an effective method for modeling motion energy,which are modeled by exploiting motion magnitude,motion orientation,gradient flow field and spatial gradient in single frame and are expected to enhance the representation of the salient object.In order to generate a more compact spatiotemporal objectness map and reduce salient object regions to be detected,this dissertation proposes a spatiotemporal objectness map extracted in a novel way,which mines the relationship between object proposals in previous frame,salient object map in previous frame and object proposals in current frame.We estimate the final salient object and suppress the background interference via graph clustering with these two robust spatiotemporal features.Extensive experiments and analysis demonstrate the effectiveness of the proposed method in reducing the interference of complex moving background.This proposed method can effectively deal with visual scenes with complex moving background.Secondly,aiming at the promotion of spatiotemporal consistency for saliency maps,this dissertation proposes a visual SOD method via robust seeds extraction and multigraphs spatiotemporal propagation,from the perspective of spatiotemporal constraint information mining and multi-graphs model spatiotemporal feature fusion under manifold regularization framework.To be more precise,the reliable saliency seeds are firstly generated by using graph clustering method.Then,the regional consistency constraints are modeled by using coarse saliency seeds,and the connection relationship between graph nodes in spatiotemporal graph are reconstructed.The prior information of image cluster segmentation is also introduced to optimize the graph edge weight.Then,the manifold regularization framework is extended from single graph model to multi-graph model,and the reliable saliency seed regions are used as query nodes to propagate the saliency information in the spatiotemporal graph model,realizing the fusion of spatial and temporal features complementarily,and improving the consistency of the salient object.Extensive experiments and analysis demonstrate the effectiveness of the proposed method in enhancing the consistency of salient objects in spatiotemporal domain.This can prove that the proposed method is capable to handle complex scenes with low contrast between foreground and background,deformable objects,object occlusion,and so on.Finally,aiming at the problem of preserving the fine edges of salient objects,this dissertation proposes a deep attentive graph convolution neural network for visual SOD,from the perspective of saliency information mining and aggregation over the graph structure nodes under the framework of deep graph convolution neural network.To be more precise,we devise a unified multi-stream deep graph convolution learning framework,by introducing a visual attention module to adaptively select graph nodes and fuse static and dynamic graph embedding features to encode spatiotemporal saliency information.We also present a novel edge-gated graph convolution operator,which can improve the representation ability of graph nodes and the performance of video SOD,by mining the relation among graph nodes explicitly and aggregate important saliency information from neighboring nodes.Extensive experiments and analysis demonstrate the proposed method can effectively aggregate important saliency information over graph structure data,and it enables the model to have fine saliency object edge preservation ability and strong learning ability.In the above research,this dissertation conduct in-depth explorations from different perspectives for the challenges of video SOD tasks in complex scenes,and propose practical and effective solutions to improve the performance of video SOD.And the experimental results show that for scenes with complex moving background,we can reduce the background interference by mining the spatiotemporal context information to construct the motion energy feature and spatiotemporal objectness feature,and the graph fusion of feature method can effectively suppress the interference of complex moving background.For scenes with low contrast between foreground and background,deformable objects and motion blur,we can improve the continuous consistency of salient object map by mining the spatiotemporal constraint information among superpixel nodes and fusing spatiotemporal feature on multi-graphs under manifold regularization framework.When the training data is available,the supervised deep graph learning method can effectively mine and aggregate the important saliency information over the graph structure data,and then make the model have the ability to keep the fine edges of salient objects.
Keywords/Search Tags:Salient object detection, Representation of spatiotemporal features, Fusion of features, Graph model, Deep learning, Attention
PDF Full Text Request
Related items