Font Size: a A A

Video Salient Object Detection Research Based On Deep Learning

Posted on:2024-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:H X GaoFull Text:PDF
GTID:2568307103974409Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
In recent years,video salient object detection has attracted growing interest.At the same time,with the development of deep learning technology,more scholars have focus on video salient object detection based on deep learning.However,the existing models generally have problems such as insufficient mining and improper utilization of spatial and temporal information,and insufficient fusion of features at different levels,resulting in incomplete highlighting of significant goals,blurred edges,and prominent background areas in saliency maps.To solve the above problems,this paper proposes three video salient object detection models based on deep learning,as follows:(1)STI-Net: Spatiotemporal integration network for video saliency detection.The model consists of three key parts,including feature aggregation,saliency prediction and saliency fusion.In the feature aggregation module,temporal information and spatial information guide each other to generate spatiotemporal depth features,and at the same time generate rough boundary clues for subsequent guidance to the saliency prediction module.In the saliency prediction module,use the boundary clues repair the defects of saliency prediction at each level,and further strengthen the spatiotemporal depth saliency features.In the saliency fusion module,the features of each level and the original information are fused to generate the final saliency maps.In addition,in order to make the model easy to train and to improve performance,it also introduces the operation of "shortcut connection" in the model.Experimental results on the public dataset show that the model has achieved comparable performance to the existing top models.(2)Quality-driven dual-branch feature integration network for video salient object detection.Firstly,the quality-driven multi-modal feature fusion module is used to combine spatial features and temporal features,and the quality scores estimated from each level’s spatial and temporal cues are not only used to weigh the two modal features but also to adaptively integrate the coarse spatial and temporal saliency predictions into the guidance map,which further enhances the two modal features.Secondly,the dualbranch-based multi-level feature aggregation module is used to fuse the multi-level spatiotemporal features,where the two branches including the progressive decoder branch and the direct concatenation branch sufficiently explore the cooperation of multi-level spatiotemporal features.In particular,in order to provide an adaptive fusion for the outputs of the two branches,the dual-branch fusion unit is adopted to learn the channel attention weights of the two decoding features based on the attention mechanism,so as to highlight important information and make the decoding results of the two branches fully fused and complementary.Experimental results on the public dataset show that the model has achieved superior performance compared with the existing top models,which fully confirms the effectiveness of the model.(3)Video salient object detection model based on graph convolutional network.On the one hand,the inter-level interaction module(Inter GCNs)is adopted,the spatiotemporal features of different levels are used as graph nodes,the edges are constructed according to the distance information between cross-modal features and cross-level features,and the node features are updated through the graph convolution network to generate spatiotemporal depth features,and cross-modal and cross-level features are integrated in the channel dimension.On the other hand,the intra interaction module(Intra GCNs)is used to map the spatiotemporal depth features in the spatial dimension,characterize the corresponding spatial regions with several nodes,update the node features through the graph convolution network,model the semantic relationship in space,strengthen the connection between the significant regions,and completely highlight the salient object.Experimental results on the public dataset show that the initial attempt of introducing the graph convolutional network into the model is effective,showing the performance comparable to the existing advanced models.
Keywords/Search Tags:video salient object detection, spatiotemporal feature aggregation, quality-driven, dual-branch feature integration, graph convolutional network
PDF Full Text Request
Related items