Research On Deep Temporal Feature Learning Algorithm Based On Self-supervised

Posted on:2022-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:J L Kang

Full Text:PDF

GTID:2518306527455354

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Videos can provide richer visual features than images,and the spatio-temporal features extracted from videos can be applied to many visual tasks,such as video retrieval,action recognition,etc.In the existing model training strategies,videos are input into networks randomly to learn spatio-temporal features.However,we find out a truth that videos have different levels of frame/clip sequence saliency,and it is easier to identify the correct frame/clip orders of the videos with high levels of frame/clip sequence saliency than those with low levels of frame/clip sequence saliency.Therefore,we believe that the effective utilization of frame/clip sequence saliency would be beneficial to spatio-temporal feature learning and improve the performance of related visual models.The contents and innovations of this thesis mainly include:1.We propose a new concept called video sequence saliency(VSS)to measure the degree of difficulty of visual models to identify the correct frames/clip orders of videos.Accordingly,we develop a novel method named progressive self-supervised spatiotemporal feature learning based on VSS(PSSFL-VSS).The algorithm includes two stages: model pre training and model transfer.For the pretrain task of clip order prediction,the pre-training strategy is to input videos into networks in the descending order by VSS values,and 3D CNNs(C3D,R3 D and R(2+1)D)are used to learn spatiotemporal features from videos.Firstly,we update the VSS value of each video based on clip order prediction results;then the videos are ranked according to the updated VSS values;after which a hyper-parameter is set to divide the ranked videos into several video groups,which will be entered into networks for training in descending order of VSS,rather than randomly as in traditional methods.The VSS value of each video and video ranking are updated at each iteration until the model converges.All experimental results show that,compared with the baseline results,the accuracy of the proposed algorithm is improved by 2.9%.There is also an obvious improvement in video clips retrieval,,video retrieval and action recognition,which verified the effectiveness and superiority of proposed models.2.To solve the difficulty in the effective learning of temporal features in video generation task,this thesis improves Self-supervised Spatio-temporal Feature Learning Video Generative Adversarial Networks(SSFLVGAN).In generator network G,we use L2-regularized loss function to solve the over fitting problem.In discriminator network D,3D average pooling layer is added after the first four convolution layers to reduce the model parameters,so as to distinguish the synthetic video and real video and identify whether the temporal relationship of motion between frames is correct.All the video generation experimental on related datasets show that,compared with the baseline results,the evaluation index of this algorithm has been effectively improved.The generated video of SSFLVGAN is more realistic.

Keywords/Search Tags:

Self-supervised Learning, Spatio-temporal Feature Learning, Clip/Video Retrieval, Action Recognition, Video Generation

PDF Full Text Request

Related items

1	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
2	Action Recognition Method Based On Multi-frequency Spatio-temporal Feature Learning
3	Video Action Detection Based On Deep Learning
4	Research Of Video Spatio-temporal Feature Extraction And Retrieval Algorithm Based On Deep Learning
5	Human Action Recognition Based On Spatial-temporal Manifold Learning
6	Research On Video-based Temporal Action Localization And Recognition
7	Video Spatio-Temporal Representation Learning Methods
8	Research On Temporal Action Location Method Combining Light And Heavy Networks In Untrimmed Video
9	Research On Video Action Recognition Technology Based On Spatiotemporal Feature Extraction
10	A Research On Weakly Supervised Learning For Video Segmentation And Action Recognition