Font Size: a A A

Research On Spatio-Temporal Action Detection Based On Self-Attention

Posted on:2022-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:X R MaFull Text:PDF
GTID:2568307169981549Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Spatio-temporal action detection is an emerging research field in computer vision.Its core task is to identify the action from input videos and to locate action objects in space and time.Spatio-temporal action detection has practical value in many fields such as surveillance video monitoring,automatic driving recognition,target behavior tracking and so on.At present,the research of spatio-temporal action detection focuses on improving the detection accuracy and balancing detection speed of the model.In view of the current research focus,this thesis proposes specific solutions from the aspects of more effective integration of spatio-temporal features,optimization of the linking process,mitigation of action background influence,module structure optimization and so on.In this thesis,the first end-to-end clip-level spatio-temporal action detector MOC is selected as the benchmark model.On this basis,spatio-temporal non-local block based on center frame fusion and link algorithm based on link label are introduced to improve the overall detection accuracy of the model.In order to balance the speed and precision,a lightweight spatio-temporal motion aware non-local block and video clip data augmentation algorithm are introduced to improve the model generalization performance.The main contributions of this thesis include:1.Propose the self-attention moving center detector SAMOC.A spatio-temporal non-local block based on center frame fusion is designed.The self-attention mechanism is used to gather spatio-temporal information to the center frame to enhance feature expression and improve the accuracy of objects location and classification.The neural network outputs the link label of the moving object to determine the link relation of the candidate frames of different frames.It solves the link error caused by moving object being too close,alleviates the dependence on prior knowledge in the linking process,and reduces the influence of the accuracy of the action tubes of the original network on the accuracy of the center branch.While maintaining real-time detection,the SAMOC achieves better detection accuracy than the benchmark on the open source datasets,and obtains the highest m AP at a high Io U threshold.2.Propose the spatio-temporal non-local action detector NOLA.To alleviate the computational scale and training overhead to adapt to the hardware constraints of real applications,a motion-aware non-local block(M-Nocal)is designed.M-Nocal reduces module overhead through pooling layer and progressive calculation,and realizes more flexible and targeted enhanced feature representation based on self-attention learnable weights.The plug-and-play feature makes the module separate from the concrete network structure and enhances the practical application range.In the video clip augmentation algorithm,static frames are added to each associated frame proportionally to build an augmented video with weakened background.In the form of alternating training of original video and augmented video,the generalization performance of network model is enhanced and the influence of background deviation of dataset is alleviated.On the whole,NOLA training is more stable and efficient.The accuracy is maintained at a high level while taking into account the balance of speed.NOLA shows good application performance and potential in real surveillance video applications.
Keywords/Search Tags:Spatio-temporal action detection, Spatio-temporal nonlocal block, Link algorithm, Video data augmentation, Self-attention mechanism, Motion-aware
PDF Full Text Request
Related items