An Improved Action Recognition Method With 3D Convolution Neural Network

Posted on:2020-08-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y R Chen

Full Text:PDF

GTID:2428330602452346

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Action recognition in videos has important academic value and broad application prospects,which make it rapidly become a research hotspot in the fields of computer vision.Therefore,it has attracted great interest of researchers and related institutions.However,action recognition is still a very challenging problem,because some real-world data are obtained from web videos or movie clips,including a large number of camera motion,complex background and lack of within-class compactness.Consequently,extracting effective features is undoubtedly very important for action recognition.This thesis summarizes and analyzes the existing behavior recognition methods,and makes the following contributions:Firstly,as for the problem that the traditional 3D convolution neural network requires a fixed-length input,an adaptive video shot segmentation strategy is proposed in this thesis,which attempts to preserve motion information and appropriate range dependencies without damaging the semantic structure,and realizes the adaptive length input of the network.This strategy captures the short-term temporal dependence in the video sequence by taking account of the motion variation information between adjacent frames.Then,the middle-term temporal dependency of video clips is captured through the spatial temporal pyramid pooling(STPP)conv Net.Subsequently,a long-term temporal pooling method is proposed.The long-term temporal dependency between video segments can be captured by adding temporal order constraints.Consequently,the adaptive long-term temporal network is constructed to get the final fixed-length Adaptive Long-Term Descriptor(ALTD).Secondly,the multi-regions attention spatial network is constructed.By combining the global attention network and local multi-regions network,the Multi-Regions Attention Descriptor(MRAD)is acquired which integrates the global and local information.In addition,the global attention network improves the discriminativeness of global attention features by adding attention module,and the local multi-regions network improves the accuracy of local multi-regions features by adding local precision constraint.Thirdly,an Adaptive Long-Term Descriptor and Multi-Regions Attention Descriptor(ALT-MRA)framework is proposed,which improves the action recognition accuracy by integrating temporal stream and spatial stream.Furthermore,all methods proposed in this thesis are verified on UCF101 and HMDB51 databases for action recognition,compared with the state-of-the-art methods.The effectiveness of all proposed methods are demonstrated by experimental results.Finally,the thesis summarizes and discusses the research,and the future work is also given for action recognition.

Keywords/Search Tags:

Action Recognition, Deep Learning, Shot Segmentation, Attention Mechanism

PDF Full Text Request

Related items

1	The Design And Implementation Of Few-shot Video Classification Based On Deep Learning Framework
2	Research On Human Action Recognition Method Based On Deep Learning
3	Studies On Action Recognition In Video Based On Deep Learning
4	Research On Human Action Recognition Method Integrating Visual Attention Mechanism And Deep Learning
5	Action Recognition Based On Two Stream Spatial-Temporal Attention Network
6	Research On Visual Action Recognition Based On Deep Learning
7	Attention Mechanism Based Deep Network For Human Action Recognition In Video
8	A Deep Action Recognition Framework With Discriminability And Temporal Characteristics
9	Action Recognition And Localization Based On Deep Learning
10	Video Action Recognition Technology Research Based On Deep Learning