Font Size: a A A

Research On Human Action Modeling And Recognition In Videos

Posted on:2021-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:X W WangFull Text:PDF
GTID:2428330611473229Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition in videos is a hot topic in the field of the computer vision.Its task is to use the computer vision and deep learning algorithms to automatically analyze and identify human actions in videos.Therefore,it is widely used in the video surveillance,the internet videos analysis,the smart home,the human-computer interaction and the shopping behavior analysis.Due to the inter-class similarity,intra-class differences of human movements and the complexity of the surrounding scenes,many problems and challenges have been encountered in building models of human action recognition.This paper focuses on the problems encountered in modeling human behavior recognition.The specific work is as follows:(1)Due to the complicated situation such as monitoring angles of view,human body postures and scenes,it is easy to cause the vanishing gradient and over-fitting by directly adding the layers of 3D convolutional neural network to extract effective visual features,which reduces the action recognition rate.To solve these problems,this paper proposes a method which is based on the double residual convolutional network.By nesting the residual network in the residual network,the double residual network fully integrates shallow and deep visual features,alleviates the impact of the vanishing gradient,and improves the performance of residual network.Finally,the proposed model is tested and evaluated on the Multiple Cameras Fall Dataset and UR Fall Dataset.The results show that the performance of double residual networks is better than other fall recognition methods based on 3D Convolutional network,3D Residual network,Pseudo-3D Residual network,and(2+1)D Residual network,which verifies the effectiveness of the proposed model for improving the abnormal behavior recognition.(2)In the field of human action localization and recognition in videos,the existing temporal action proposals methods have not solved the long-term dependence better,which results lower recall rates of proposals.In view of above problems,a method based on context information fusion for temporal action proposals is proposed in this paper.Firstly,the spatiotemporal features of video units are extracted by the 3D convolutional network.Then,the bidirectional recurrent network is used to construct the context relationship for predicting the temporal action proposals.Considering the problems of more parameters and the vanishing gradient in the Gated Recurrent Unit(GRU),a Simplified-GRU(S-GRU)is proposed,in which the input features control the gating structure to enhance the parallel computing capability and the weighted average is introduced to enhance the ability of the gated recurrent unit to adaptively fuse the history and current time information.Finally,experimental results on the Thumos14 dataset demonstrate that the method based on the bidirectional S-GRU for temporal action proposals improves the recall rate of proposals.(3)Due to the large number of background video fragments or video frames in the long videos,it is difficult for the recurrent network to capture the motion area of interest,which reduces the recall rate of temporal action proposals.Aiming at the above problems,this paper introduces two attention-guided networks,the multi-head attention network and the background suppression network,to enhance the temporal correlation inside videos and improve the recall rate of temporal action proposals.In the training phase,the multi-task loss training background suppression network and temporal proposals network jointly are used.During the testing phase,the background suppression network and the multi-head attention network adaptively output attention weights to guide the temporal location task.Experiments of temporal action proposals and action detection were performed on the public dataset Thumos14.The results show that the method proposed in this paper improves the recognition of human action.
Keywords/Search Tags:Vanishing gradient, residual network, recurrent network, attention network, action recognition
PDF Full Text Request
Related items