Video Saliency Prediction With Joint Spatio-temporal Attention

Posted on:2024-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:H Xue

Full Text:PDF

GTID:2568307064485434

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

1)A two-stream convolutional neural network prediction model ECANet based on an explicit cyclic spatio-temporal attention mechanism is proposed.Compared with the traditional method to input and model the whole video frame indiscriminately,this paper considers the pixels near the attention region to be more important,so the global image and the local image of the attention region are taken as the two input streams of the model respectively,and the 3D convolutional neural network is used as the backbone of the model.The network structure is used to complete the joint feature extraction of temporal and spatial information,and the fused-auxiliary pooling mechanism is proposed to fuse and upsample the global attention and local attention information to generate the final saliency prediction map.In addition,the prediction results of each time step of the model are transmitted as temporal information throughout the iterative video prediction process,thus breaking the limitations of the time dimension of the convolutional kernel and the size of the input sliding window of the 3D convolutional neural network,and improving the temporal modeling capability for long videos.2)A collective spatio-temporal attention mechanism COSt A is proposed.The mechanism is a lightweight plug-and-play attention module that calculates temporal and spatial weight information based on different planes of video three-dimensional features,and selectively emphasizes massive data in features by fully utilizing intraframe and inter-frame information,enhancing the model’s local perception ability of spatio-temporal feature value.Based on the COSt A mechanism,this paper further designs a video saliency prediction model TASED-COSt A,which introduces spatio-temporal attention mechanism to optimize the performance of classic video saliency model TASED-Net,enabling more accurate video saliency prediction with almost no increase in performance load.3)The results of quantitative and qualitative visualization experiments show that the proposed method is superior in terms of accuracy and computational efficiency compared with related methods published in the same period.

Keywords/Search Tags:

Deep learning, computer vision, video saliency prediction, explicit spatiotemporal attention mechanism, collective spatio-temporal attention mechanism

PDF Full Text Request

Related items

1	Research And Application On Prediction Methods For Multidimensional Sparse Spatio-temporal Data
2	Spatio-temporal Attention Model For Video Captioning
3	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
4	Research On Video Captioning Algorithms Based On Spatiotemporal Attention
5	Research On Analysis And Prediction Of Spatiotemporal Big Data
6	Research On Video Saliency Detection Algorithm Based On Attention Mechanism
7	Application Research On Video Object Segmentation With 3D CNN And Attention Mechanism
8	Research On Network Traffic Prediction Based On Spatio-Temporal Characteristics
9	Technologies And Applications Of Visual Saliency Detection For Image Datum
10	Research On Crowd Flow Prediction Based On Spatio-Temporal Attention Network