Font Size: a A A

Temporal Action Localization In Massive Multimedia Video Scenario

Posted on:2020-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:H R LiFull Text:PDF
GTID:2518306518464784Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Temporal action localization is a challenging task that requires not only the determination of the category of video clips,but also identification of the temporal boundaries(starting and ending time points)of the action instances in the untrimmed videos.During the era of big data,the explosive growth of multimedia information,such as videos,highlights the importance of automatic behavior analysis and detection.However,because of the complex scene information of real multi-video and the complexity of human behavior,it is still difficult to design a robust,portable and high-precision action localization algorithm.In the present study,we developed a novel method for effective extraction of spatial feature and temporal feature,robust and accurate generation of features,and achieving high-performance temporal action localization.The main innovations of this paper are:1)Attention-based feature extraction and fusion.We proposed an attention-based module which can adaptively extract important features and flexibly fuse spatial feature and temporal feature,to generating high semantic features.2)Explicit discrimination and extraction of long-short term features.We apply the two features extraction module to respectively extract long term features and short term features.The auxiliary losses are placed at the output of short term features extraction module to make two features extraction modules focus on their own feature extraction,boosting the accuracy of the algorithm.3)CNN-based long term feature extraction module.We apply CNN to extract long term features of videos,and use structure temporal pooling to dynamically adjust the receptive field of time domain.So,the receptive field of extraction module does not have maximum upper bound.Besides,our module can also effectively extract global features of actions that lasts longer.As a demonstration of its effectiveness,our method was used to achieve state-of-the-art performance on two challenging datasets,namely,THUMOS14 and Activity Net.
Keywords/Search Tags:Action localization, Attention, Spatial-temporal feature, Video content analysis, Convolutional neural networks, Supervised learning
PDF Full Text Request
Related items