Action Detection Based On Deep Learning

Posted on:2020-11-16

Degree:Master

Type:Thesis

Country:China

Candidate:S G Wang

Full Text:PDF

GTID:2428330596476182

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

In order to meet the growing demand of video analysis and related applications,deep learning based temporal action detection method has attracted more and more attention.Based on the theoretical framework of object detection,this paper focuses on the description of temporal and spatial characteristics of human activities,scene information and context information modeling,and the construction of human action detection network framework.On the whole,this thesis studied action action detection from the following perspectives.1.Cascaded boundary-sensitive temporal proposal and multi-task learning for action detection.This method adopt the framework that first generates temporal proposals,and then make classification of them.In the temporal proposal generation stage,this thesis first evaluate whether each sampling frame contains action and whether it is the start boundary or end boundary of an action instance,obtaining three score sequences.Then construct a preliminary proposal set based on these three score sequences according to certain rules.Finally,filtering the temporal proposals generated in the previous step.In the temporal proposal classification stage,the entire video feature sequence is used to extract the features for each proposal,and then the overlap loss and the softmax loss are introduced to multi-task modeling.2.3D convolution and temporal region proposal based net for action detection.This method extends the 2D object detection framework(Faster RCNN)to 3D temporal action detection(R-C3D)for end-to-end training.Among them,three-dimensional convolution hierarchical network shares weights to extract task-related features of the entire video;temporal proposal subnetwork is used to generate temporal proposals;and action classification subnetwork is used to classify and fine-tune the generated proposals.The method train the network by optimizing both the classification and regression tasks jointly for the two subnets.3.Multi-scale based context-aware net for action detection.This method mainly considers the insufficiency of R-C3 D in processing context information of temporal actions.In order to find a suitable context for temporal actions of different scales,the context features of different resolutions and context scales are constructed.Then a two-branch structure is constructed with the idea of “candidate-control”.One branch generates contextual feature candidates,and the other branch controls the passing of candidate contextual features with a gate function(“sensitivity”).Subsequently,a lot of ablation experiments have been done to investigate the influence factors: the effects of different contextual regions,their scales and gate function selection.The results show that the proposed method can effectively integrate the contextual information of multi-scales,and further improve the detection performance of the whole algorithm(more than 10% improvement on THUMOS'14 dataset).

Keywords/Search Tags:

human action detection method, temporal action detection, temporal activity detection, context-aware, boundary sensitive

PDF Full Text Request

Related items

1	Research On Temporal Action Detection Based On Relation Aware And Global Context
2	Research On Temporal Action Detection Algorithm Based On Boundary Matchin
3	Research On Temporal Action Detection Based On Accurate Boundary Prediction
4	Temporal Convolutional Network Based Temporal Action Detection
5	Research On Human Action Analysis And Recognition Method Based On Deep Learning
6	Research On Temporal Action Location Method Combining Light And Heavy Networks In Untrimmed Video
7	Research On Temporal Action Detection In Video
8	Research On Temporal Action Detection Based On Neural Network
9	Design And Implementation Of Context Cascade Network For Video Temporal Action Detection
10	Research On Action Detection Method Based On Untrimmed Video