Video Spatial-Temporal Features Collaborative Learning And Fusion Methods

Posted on:2022-08-14

Degree:Master

Type:Thesis

Country:China

Candidate:H Z Wu

Full Text:PDF

GTID:2518306323979149

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the continuous development of multimedia information technology and Internet communication technology in recent years,as well as the rise of various live short video platforms,the video information has completely flooded our lives.In order to meet the needs of users more accurately and efficiently,research on video understanding and feature extraction has also emerged.Video contains the most complex information in all multimedia data,and a video can be regarded as an aggregation of multiple pictures.Compared with pictures,video not only has the features of the space dimension,but also the time dimension.This paper conducts in-depth research on the extraction of video spatial-temporal features.According to different design concepts,we divide video feature extraction into two research directions:mutually reinforced spatio-temporal convolutional tube and multi-scale spatial-temporal integration convolutional tube.The main research contents and innovations are as follows:(1)Mutually Reinforced Spatio-Temporal Convolutional TubeWe propose a Mutually Reinforced Spatio-Temporal Convolutional Tube(MRST)towards robust and accurate human action recognition.Specifically,the MRST consists of a spatio-temporal decomposition unit,a mutually reinforced unit and a spatio-temporal fusion unit.The spatio-temporal decomposition unit extracts spatial and temporal features with a 2D convolution and a 1D convolution,respectively.The mutually reinforced unit learns the correlation between spatial and temporal information,and utilizes the correlation to reinforce the discriminative capability of the spatial features and temporal features.The spatio-temporal fusion unit selectively emphasizes informative spatial and temporal features and fuses them to get the effective spatial-temporal feature maps.Experiment results show our proposed deep MRST-Net achieves state-of-the-art performance on multiple datasets.(2)Multi-Scale Spatial-Temporal Integration Convolutional TubeWe propose a novel and efficient Multi-Scale Spatial-Temporal Integration Convolutional Tube aiming at achieving accurate recognition of actions with lower computational cost.Specifically,the MSTI tube consists of multi-scale convolution block and cross-scale attention weighted blocks.The multi-scale convolution block divides the input tensor into several groups,and each group has their own convolutional filters.The cross-scale attention weighted blocks integrate multi-scale spatial and temporal features,aiming at selectively emphasizing informative spatial-temporal features and suppressing less useful ones.Benefiting from the two blocks,our MSTI-Net requires significantly less computational resources yet achieving the state-of-the-art action recognition accuracy.

Keywords/Search Tags:

Video feature extraction, Deep learning, CNN, Spatial-temporal collaborative learning, Mutually reinforced, Multi-scale, Fusion methods

PDF Full Text Request

Related items

1	Research On Video Classification Method Based On Deep Learning Network
2	Research On Video Action Recognition Method Based On Spatial-Temporal Feature Fusion And Deep Learning
3	No-Reference Video Quality Assessment Based On Spatial-Temporal Feature Learning
4	Collaborative Representation-based Subspace Learning Methods
5	Research On Multi-Scale Fusion Cross Modal Retrieval Based On Deep Learning
6	Video Action Recognition Based On Hybrid Attention Mechanism And Multi-scale Feature Fusion
7	Research Of Video Action Recognition Based On Spatial-Temporal Feature
8	Research On Video-Based Person Re-Identification Method Based On Multi-Scale Enhancement And Temporal Sequential Fusion
9	Study On Face Recognition Based Deep Subspace Fusion Network
10	Research Of Video Spatio-temporal Feature Extraction And Retrieval Algorithm Based On Deep Learning