Font Size: a A A

An Improved Action Recognition Method With 3D Convolution Neural Network

Posted on:2020-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y R ChenFull Text:PDF
GTID:2428330602452346Subject:Engineering
Abstract/Summary:PDF Full Text Request
Action recognition in videos has important academic value and broad application prospects,which make it rapidly become a research hotspot in the fields of computer vision.Therefore,it has attracted great interest of researchers and related institutions.However,action recognition is still a very challenging problem,because some real-world data are obtained from web videos or movie clips,including a large number of camera motion,complex background and lack of within-class compactness.Consequently,extracting effective features is undoubtedly very important for action recognition.This thesis summarizes and analyzes the existing behavior recognition methods,and makes the following contributions:Firstly,as for the problem that the traditional 3D convolution neural network requires a fixed-length input,an adaptive video shot segmentation strategy is proposed in this thesis,which attempts to preserve motion information and appropriate range dependencies without damaging the semantic structure,and realizes the adaptive length input of the network.This strategy captures the short-term temporal dependence in the video sequence by taking account of the motion variation information between adjacent frames.Then,the middle-term temporal dependency of video clips is captured through the spatial temporal pyramid pooling(STPP)conv Net.Subsequently,a long-term temporal pooling method is proposed.The long-term temporal dependency between video segments can be captured by adding temporal order constraints.Consequently,the adaptive long-term temporal network is constructed to get the final fixed-length Adaptive Long-Term Descriptor(ALTD).Secondly,the multi-regions attention spatial network is constructed.By combining the global attention network and local multi-regions network,the Multi-Regions Attention Descriptor(MRAD)is acquired which integrates the global and local information.In addition,the global attention network improves the discriminativeness of global attention features by adding attention module,and the local multi-regions network improves the accuracy of local multi-regions features by adding local precision constraint.Thirdly,an Adaptive Long-Term Descriptor and Multi-Regions Attention Descriptor(ALT-MRA)framework is proposed,which improves the action recognition accuracy by integrating temporal stream and spatial stream.Furthermore,all methods proposed in this thesis are verified on UCF101 and HMDB51 databases for action recognition,compared with the state-of-the-art methods.The effectiveness of all proposed methods are demonstrated by experimental results.Finally,the thesis summarizes and discusses the research,and the future work is also given for action recognition.
Keywords/Search Tags:Action Recognition, Deep Learning, Shot Segmentation, Attention Mechanism
PDF Full Text Request
Related items