Font Size: a A A

Research On Action Recognition Algorithm Based On Spatiotemporal Modeling And Its Application

Posted on:2022-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:X WuFull Text:PDF
GTID:2518306335466334Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition has many applications in smart city security,video retrieval,humanmachine interaction,unmanned supermarkets,and its importance is increasingly prominent.How to effectively characterize the spatiotemporal feature in videos is a key point in human action recognition.To better model the spatiotemporal information of the video,many studies have adopted 3D convolutions combined with optical flow and other motion supplementary information methods while ignoring the difficulty of optimizing the network and increasing the size of networks and computational complexity.Besides,when faced with the problem that different action instances with different durations in videos,many research simply stack convolution with the local receptive field to deal with long-term problems.The information of long-distance frames is weakened so that the spatiotemporal modeling is not optimal.In view of the above shortcomings and difficulties,this paper researches the human action recognition algorithm and its application for the aim to reduce the amount of computation and improve the accuracy.First,this paper proposes an efficient spatiotemporal modeling method;then,the algorithm is further extended to a multi-view action recognition framework,the action recognition algorithm based on multi-view feature fusion is proposed and applied to abnormal action recognition in lifts.The main work and contributions are as follows,1.The motion information of the feature is enhanced.This paper aims at not increasing the amount of calculation and specifically solves the shortcomings of the existing methods,the motion information in the features is enhanced and integrated into the entire spatiotemporal feature learning framework.2.A multi-scale spatiotemporal feature aggregation module is designed.This paper designs a multi-scale spatiotemporal feature aggregation module to solve the difficulty of different durations of action instances in videos,especially the problem of long-term modeling.Different from the existing methods of simple stacking on spatiotemporal convolutions or fusion features later,this paper realizes the aggregation of multi-scale inter-frame information through a multi-level residual structure and more effectively modeling long-term temporal features.Finally,the Motion Feature Enhancement Module(MFEM)and the Multi-scale Spatiotemporal Modeling Module(MSMM)are effectively integrated and merged into a unified 2D convolutional network,which constitutes the spatiotemporal multi-scale feature aggregation for action recognition based on motion enhancement in this paper.3.A multi-view action recognition algorithm based on viewpoint-aware attention is realized and applied to abnormal action recognition in lifts.Based on the pre viously proposed spatiotemporal multi-scale feature aggregation for action recognition based on motion enhancement,to solve the problems of occlusion and semantic loss in a single view,a multi-view action recognition algorithm based on viewpoint-aware attention feature fusion is further proposed.To achieve better multi-view fusion results,this paper designs a Channel-wise Viewpoint-aware Attention(CWVAA)module,which can capture information distinguishing between different views to combine multi-view features effectively.Then the method is applied to lifts to improve the occlusion problem that often occurs in the perspective of lifts and improve the accuracy of the abnormal action recognition in lifts.
Keywords/Search Tags:action recognition, multi-scale spatiotemporal features, multi-view feature aggregation, motion features, attention mechanism
PDF Full Text Request
Related items