Font Size: a A A

Video Action Detection Based On Temporal Analysis

Posted on:2021-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:T LiFull Text:PDF
GTID:2428330623467790Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,methods based on deep learning fared well in practice.And deep learning push forward the development of computer vision and natural language processing.Most of the data generated by human society is video.So it's very important for video analysis.Video analysis is a fundamental and critical task in computer vision.In the study,we not only need to classify the trimmed videos,but also analyze the untrimmed videos.Because there are more complex scenes and background noise in untrimmed videos.This leads to another very challenging task,namely video action detection.Video action detection is an important task in video analysis area,which aims to locate action instances in untrimmed long videos with both action categories and temporal boundaries.Video action detection has great commercial value in human society.For example,in the field of intelligent security,we can locate and identify abnormal action by using video action detection,which greatly reduces labor costs.Video action detection is an basic task in video analysis area.The difficulties of video action detection can be summarized into three-flods: 1)there is a correlation between video frames,for example,some action proposal may span multiple different images,so it's important to model the long-term dependence in video? 2)the length of action proposals in videos are different,so multi-scale information is very important? 3)there are many inaccurate or redundant action proposals.To solve above problems,we have proposed different framework in this paper:1)In order to model long-term dependencies,we propose a residual temporal con-volution module that fuses semantic information of different layers and a bi-directional LSTM for capturing long-term dependencies.2)In order to obtain multi-scale information,we propose a pyramidal context-aware mechanism(PCM),which contains a series of temporal dilated convolutions.Therefore,shallow layers of PCM are responded to the shortterm action instances,while deep-level features of PCM exploit long-term temporal dynamics.Without loss of temporal resolu-tion,PCM not only has a large temporal receptive field,but also produces a multi-scale feature representation.we use dense connectivity structure to aggregate multi-scale con-textual features.3)We propose two ways to extract the features of action proposals.The first is the attention mechanism based on key frames,which can identify the key frames in the action proposals,and then weight the different features of the action proposals to obtain a fixed feature representation.The other one automatically calculates the confidence map of all candidates,based on the aggregated representations of relevant temporal units for each candidate by imitating VLAD quantization.
Keywords/Search Tags:Deep Learning, Video Analysis, Action Detection, Long-term Dependency, Multi-scale Information, Attention Mechanisms
PDF Full Text Request
Related items