Research Of Temporal Action Localization Algorithm Based On Weakly-Supervised Deep Learning

Posted on:2022-08-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Li

Full Text:PDF

GTID:2558307154968549

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Temporal action localization aims to localize temporal boundaries of action instances and identify the corresponding action categories in untrimmed long videos with only video-level labels,which is widely used in several fields,such as automatic driving,video surveillance,virtual reality,and video retrieval.Fully-supervised temporal action localization uses action category labels and action temporal labels as supervision information,and requires a lot of training data.However,in real-world,annotating the temporal labels of action instances consumes a lot of manpower and material resources.As a result,weakly-supervised temporal action localization that only requires action category labels is derived.To solve the above problem,this paper mainly focuses on weakly-supervised temporal action localization,the specific work flow and research results are as follows:We propose a multi-branch temporal action localization network called MTALNet,which contains both temporal fusion model and multi-branch attention model.The temporal fusion model maps the video features to the feature space for the task of weakly-supervised temporal action localization by fusing the local temporal related information and global temporal related information of the videos.The multi-branch attention model is managed to model the distinguishable actions,distinguishable background and ambiguous action in the videos.Based on the multi-branch attention weights,three temporal class activation sequences were constructed to optimize the action classification loss,so that the network could separate the distinguishable action features and distinguishable background features.Experimental results have shown that the proposed approach outperforms multiple state-of-the-art methods with an average localization precision of 29.6% at different Io U thresholds on the Thumos-14 dataset.Nowadays,most weakly-supervised temporal action localization algorithms aggregate distinguishable action features with high activation values to optimize the classification loss.As a result,the network would ignore the ambiguous actions in the videos which are difficult to classify,which makes it difficult to ensure the completeness of the localization results.To this end,we design an ambiguous action contrast loss function,refining the ambiguous action features under the guidance of distinguishable features,so that the network could perceive precise action temporal boundaries to avoid the temporal interval interruption.Combined with the proposed loss function,MTALNet outperforms previous methods on three weakly-supervised temporal action localization benchmarks including Thumos-14,Activity Net-1.2 and Activity Net-1.3.Specifically,the localization precision is improved by 1.5%,1.3%and 1.2% respectively.The visualization results have shown that the ambiguous action contrast loss function can effectively improve the misclassification of ambiguous actions and enable the network to capture more complete action segments.

Keywords/Search Tags:

Weakly-supervised deep learning, Temporal action localization, Ambiguous action, Temporal class activation sequence, Attention mechanism

PDF Full Text Request

Related items

1	Weakly Supervised Action Localization Research Based On Attention Mechanism
2	Deep Learning Based Temporal Action Localization
3	Relation Aware Network For Weakly-Supervised Temporal Action Localization
4	Research On Video-based Temporal Action Localization And Recognition
5	Temporal Action Detection Based On Deep Learning
6	Weakly-supervised Temporal Action Localization
7	Temporal Action Localization In Massive Multimedia Video Scenario
8	Weakly Supervised Temporal Action Detection In Untrimmed Video
9	Research On Weakly Supervised Temporal Action Localization
10	Action Recognition And Temporal Action Localization Based On Attention Mechanism