| 1)A two-stream convolutional neural network prediction model ECANet based on an explicit cyclic spatio-temporal attention mechanism is proposed.Compared with the traditional method to input and model the whole video frame indiscriminately,this paper considers the pixels near the attention region to be more important,so the global image and the local image of the attention region are taken as the two input streams of the model respectively,and the 3D convolutional neural network is used as the backbone of the model.The network structure is used to complete the joint feature extraction of temporal and spatial information,and the fused-auxiliary pooling mechanism is proposed to fuse and upsample the global attention and local attention information to generate the final saliency prediction map.In addition,the prediction results of each time step of the model are transmitted as temporal information throughout the iterative video prediction process,thus breaking the limitations of the time dimension of the convolutional kernel and the size of the input sliding window of the 3D convolutional neural network,and improving the temporal modeling capability for long videos.2)A collective spatio-temporal attention mechanism COSt A is proposed.The mechanism is a lightweight plug-and-play attention module that calculates temporal and spatial weight information based on different planes of video three-dimensional features,and selectively emphasizes massive data in features by fully utilizing intraframe and inter-frame information,enhancing the model’s local perception ability of spatio-temporal feature value.Based on the COSt A mechanism,this paper further designs a video saliency prediction model TASED-COSt A,which introduces spatio-temporal attention mechanism to optimize the performance of classic video saliency model TASED-Net,enabling more accurate video saliency prediction with almost no increase in performance load.3)The results of quantitative and qualitative visualization experiments show that the proposed method is superior in terms of accuracy and computational efficiency compared with related methods published in the same period. |