Video Action Detection Based On Temporal Analysis

Posted on:2021-04-14

Degree:Master

Type:Thesis

Country:China

Candidate:T Li

Full Text:PDF

GTID:2428330623467790

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Recently,methods based on deep learning fared well in practice.And deep learning push forward the development of computer vision and natural language processing.Most of the data generated by human society is video.So it's very important for video analysis.Video analysis is a fundamental and critical task in computer vision.In the study,we not only need to classify the trimmed videos,but also analyze the untrimmed videos.Because there are more complex scenes and background noise in untrimmed videos.This leads to another very challenging task,namely video action detection.Video action detection is an important task in video analysis area,which aims to locate action instances in untrimmed long videos with both action categories and temporal boundaries.Video action detection has great commercial value in human society.For example,in the field of intelligent security,we can locate and identify abnormal action by using video action detection,which greatly reduces labor costs.Video action detection is an basic task in video analysis area.The difficulties of video action detection can be summarized into three-flods: 1)there is a correlation between video frames,for example,some action proposal may span multiple different images,so it's important to model the long-term dependence in video? 2)the length of action proposals in videos are different,so multi-scale information is very important? 3)there are many inaccurate or redundant action proposals.To solve above problems,we have proposed different framework in this paper:1)In order to model long-term dependencies,we propose a residual temporal con-volution module that fuses semantic information of different layers and a bi-directional LSTM for capturing long-term dependencies.2)In order to obtain multi-scale information,we propose a pyramidal context-aware mechanism(PCM),which contains a series of temporal dilated convolutions.Therefore,shallow layers of PCM are responded to the shortterm action instances,while deep-level features of PCM exploit long-term temporal dynamics.Without loss of temporal resolu-tion,PCM not only has a large temporal receptive field,but also produces a multi-scale feature representation.we use dense connectivity structure to aggregate multi-scale con-textual features.3)We propose two ways to extract the features of action proposals.The first is the attention mechanism based on key frames,which can identify the key frames in the action proposals,and then weight the different features of the action proposals to obtain a fixed feature representation.The other one automatically calculates the confidence map of all candidates,based on the aggregated representations of relevant temporal units for each candidate by imitating VLAD quantization.

Keywords/Search Tags:

Deep Learning, Video Analysis, Action Detection, Long-term Dependency, Multi-scale Information, Attention Mechanisms

PDF Full Text Request

Related items

1	Research And Implementation Of Video Action Recognition Based On Long-Time Feature Fusion And Attention Mechanism
2	Video Action Recognition Technology Research Based On Deep Learning
3	Specific Action Detection Algorithm Based On Deep Learning
4	Research On Spatio-Temporal Action Detection In Videos Based On Deep Learning
5	Research On Video Action Recognition Based On Deep Learning
6	Dependency Parsing Research Model Based On Deep Learning
7	Research On Human Action Recognition Method Based On Deep Learning
8	Temporal Information And Multi-Scale Fusion Based Video Object Detection
9	Research On Human Action Recognition Method Integrating Visual Attention Mechanism And Deep Learning
10	Learning Spatiotemporal Features In Video For Action Recognition