Font Size: a A A

Spatiotemporal Multi-task Network For Video Action Detection

Posted on:2018-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2348330512499464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,remarkable progress has been achieved in video action detection by using deep learning techniques.However,for action detection in real-world untrimmed videos,the accuracies of most existing approaches are still far from satisfactory,due to the difficulties in temporal action localization.To tackle this challenge,we propose a spatiotemporal,multi-task,3D deep convolutional neural network to detect(including temporally localize)actions in untrimmed videos.First,we introduce a fusion framework which aims to extract video-level spatiotemporal features in the training phase.And we demonstrate the effectiveness of video-level features by evaluating our model on video action recognition task.Then,under the fusion framework,we propose a spatiotemporal multi-task network,which has two sibling output layers for action classification and temporal localization,respectively.To obtain precise temporal locations,we present a novel temporal regression method to revise the proposal window which contains an action.Meanwhile,in order to better utilize the rich motion information in videos,we introduce a novel video representation,interlaced images,as an additional network input stream.As a result,our model outperforms state-of-the-art methods for both action recognition and detection on standard benchmarks.
Keywords/Search Tags:spatiotemporal features, multi-task, deep learning
PDF Full Text Request
Related items