Font Size: a A A

Research And Application Of Temporal Action Detection In Videos

Posted on:2022-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y S LiuFull Text:PDF
GTID:2518306572469294Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Temporal action detection aims at predicting the categories as well as the start time and end time of action instances in untrimmed videos,which has been applied to multiple applications,such as video search and intelligent security.Although existing temporal action detection algorithms have made remarkable progress,it is still difficult to meet the requirements in practical applications.Therefore,it is one of the hot issues of the current research in computer vision.This paper is about three specific practical requirements: temporally detecting actions,extending to new category actions,and localizing actions via query sentence.The main contributions of this paper are summarized as following:(1)This paper proposes a pair loss for temporal action detection to tackle the high boundary false alarm rate and redundant results of existing algorithms.It introduces the action category label into the proposal generation stage during training process.Then with the category information,the pair loss is calculated to ensure that the same action category or the same temporal phase have a closer feature representation.This work is able to predict the starting and ending points precisely and improve the performance at high temporal Intersection-over-Union threshold.Extensive experiments demonstrate that a remarkable improvement of average recall is attained especially when the number of proposals is small.Moreover,the generated proposals have more precise temporal boundaries.(2)This paper proposes an example-driven detection network for weaklysupervised temporal action detection to tackle the limited applicability of existing algorithms to newly emerging action category which does not appear in the original training dataset.It automatically trains a self-attention module with VAE reconstruction restriction and feature discrepancy restriction to localize all relevant action instances according to the given example videos or images.Extensive experiments demonstrate that this work requires less training data and is especially suitable for the localization of new action categories to meet the requirements of different occasions.(3)This paper proposes a dense confidence map for temporal action localization via query sentence to tackle the problem that existing algorithms process each candidate moment individually without considering their interrelationships.It constructs a two-dimensional cross-modal feature map for each candidate moment to model the temporal dependency.Then it directly performs convolution over the feature map to generate confidence scores.Extensive experiments demonstrate that this work can perceive content-adaptive context information from neighboring moment candidates to localize actions simply and effectively according to query sentences without prior information or complex postprocess.Based on the above work,this paper designs and implements a prototype system for temporal action detection,which can accurately and effectively locate actions in videos according to different requirements.
Keywords/Search Tags:Computer Vision, Video Understanding, Temporal Action Detection, a Self-attention Mechanism, Pair Loss
PDF Full Text Request
Related items