Action Localization And Recognition Based On Temporal Analysis

Posted on:2022-03-21

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F C Long

Full Text:PDF

GTID:1488306323964339

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of Web 3.0,advanced technologies lead to the surge of artificial intelligence in research community,such as big data,mobile Internet,Internet of Things and parallel computing.The multimedia applications in our daily life are the hotspot in research field of computer science.Compared to static images,videos carry motion and auditory information,making such media more complex,and thus the temporal dynam-ics in videos are unique and critical to video analysis.The research in video analysis has proceeded along several directions,such as video object detection,video captioning and temporal action recognition and localization,etc.In between,temporal action recogni-tion and localization are necessary for the development of human-computer interaction.The technology allows machine to understand and recognize human behaviors,which benefits various tasks of robots.However,due to the rich content of action video,the naive algorithm such as sliding windows to segment videos will produce a lot of redun-dant candidates,in which the temporal structure is also not well explored.Meanwhile,acquiring temporal annotations of action is very expensive which limits the capacity of localization models.How to leverage the limited temporal annotation of actions to enlarge the scalability of action localization model is another urgent problem.In order to solve the above two problems,this thesis starts from the analysis of temporal structure of videos,and then delves into the hierarchical structure,temporal scale and generalization ability of action localization/recognition models.The thesis proposes the methods of coarse-to-fine action proposal networks,Gaussian temporal awareness networks,localization based on domain transferring and weakly supervised pre-training of network backbone.The contributions are summarized as:(1)By exploring the temporal hierarchical granularities of actions,we propose to localize temporal action proposals in a "coarse-to-fine" manner.To materialize this idea,we proposal a coarse-to-fine temporal action proposal approach.The approach first models action proposals with three different actionness curves(namely pointwise,pairwise,and recurrent curves)to produce coarse action proposals.Then a 1D con-volution neural network is employed to refine temporal boundaries in a fine-grained manner.Finally,a proposal re-ranking network is devised to identify proposals from the two stages.Compared to the proposal model only in coarse level,our method lead to 2.5%and 4.1%performance gains on average recall and AUC,which demonstrates the effectiveness of the proposed coarse-to-fine manner for temporal action proposal.(2)To address the problem of predtermined temporal scales in traditional one-shot action localization model,we introduce to predict a particular interval of each proposal dynamically by Gaussian Temporal Awareness Networks.Through learning Gaussian kernels for each cell of the feature map,the temporal scale of the temporal action pro-posal is dynamically optimized.Multiple Gaussian kernels which are highly overlaped with each other could even be mixed to capture action proposals with arbitrary length.Moreover,the values in each Gaussian curve reflect the contextual contributions to the localization of an actions proposal.Extensive experiments are conducted on both THU-MOS14 and ActivityNet v1.3 datasets and the proposed approach achieves 1.9%and 1.1%improvements in mAP on testing set of the two datasets.(3)For the improvement of the category scalability of action localization model,we introduce a new design of transfer learning type to learn action localization for a large set of action category,but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes.In detail,we bridge the relation between temporal action localization and moments recognition through a weight transfer function and hallucinate the context of the action moments for localization training.In this work,we successfully extend action localization to 600 categories by utilizing moment data in Kinetics-600 dataset.(4)Since the network backbone is usually fixed during localization model training,the performances largely depend on the generalization ability of the backbone.In the thesis,we introduce a weakly-supervised method for network backbone training by uti-lizing the large-scale web video data.However,there exists two issues of web videos,i.e.,"query ambiguity"(uncertainty of meaning or search intention)and "text isomor-phism"(same syntactic structure of different text).Solely capitalizing on such supervi-sion will mislead the video representation learning and we propose a Twin-Turbo Net-works to calibrate across each other for more accurate supervision.On various datasets of the downstream action recognition task,weakly-supervised pre-training TTN leads to 2.8%,1.9%and 2.7%gains in top-1 accuracy on Kinetics-400,Something-Something V1&V2 datasets over the best competitor with fully-supervised ImageNet pre-training.

Keywords/Search Tags:

Temporal Action Proposal, Temporal Action Localization, Transfer Learning, Network Pre-training, Action Recognition

PDF Full Text Request

Related items

1	Temporal Convolutional Network Based Temporal Action Detection
2	Temporal Action Localization And Action Recognition Based On Deep Learning
3	Research On Temporal Action Location Method Combining Light And Heavy Networks In Untrimmed Video
4	Research On Algorithm Of Temporal Action Detection
5	Research On Temporal Action Detection And Action Recognition Based On Deep Learning
6	Research On Temporal Action Detection Based On Accurate Boundary Prediction
7	Research On Video-based Temporal Action Localization And Recognition
8	Algorithm Of Complex Action Recognition Based On Temporal Proposals
9	Video Action Detection Based On Deep Learning
10	Research On Video Temporal Action Localization Based On Deep Learing