Font Size: a A A

Research And Implementation Of Human Action Recognition Based On Temporal And Spatial Relationship Enhancement

Posted on:2020-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:S D YangFull Text:PDF
GTID:2428330590496468Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In such an era that information technology develops rapidly,video has gradually become a main carrier of data.Therefore,how to understand a video is an important research field in artificial intelligence and has attracted the attention of scholars in various fields.Human action recognition can be applied to many fields such as intelligent monitoring,group activity analysis,and video abstraction.Human action recognition(also known as behavior recognition)refers to determining the type of action contained in a given video sequence,such as running,shaking hands,boxing etc.However,unlike image-based recognition and object detection,video data is more complex,and frames are extremely similar.Direct processing can easily lead to over-fitting and costs a large amount of computational resource.Therefore,how to design an efficient human behavior recognition algorithm has become an urgent problem need to be solved.Based on the convolutional neural network and attention mechanism in deep learning,this paper designs two new action recognition algorithms from the perspective of action temporal and spatial connection.The experimental results prove that the proposed structure effectively improves the accuracy of action recognition.At the same time,a series of extended experiments were carried out to explore the effect of different factors on human action recognition.The main work of this paper is as follows:1.Optimize the traditional data preprocessing method,discard the video frame drawing,reduce the decoding redundancy,greatly reduce the hard disk storage pressure,and reduce the data read deadlock situation.The proposed method uses online learning to load data.The video file is used as the main sample,and we use a pointer to slide and read the required key frames,which greatly improves the calculation efficiency and reduce the training time.For the high migration cost because of data fragmentation,this paper uses the HDF5 file format to encapsulate the optical flow feature,which enhances the data portability and improves the I/O utilization.2.Since the action is based on the time,the different actions have different motion change mode in time series.Based on above,this paper designs a variety of timing capture modules to explore their effects on the accuracy of action recognition tasks.Through a large number of experimental analysis,this paper innovatively proposes a Multi-Head Sigmoid SelfAttention model for action sequences capture,using Sigmoid to complete the activation of attention value,reducing the strong competition between temporal features,and combining Multi-Head structure to learn the relation of multiple templates,and generating more robust features,Through experiments on the HMDB51 dataset,the proposed module can effectively improve the accuracy of human action recognition.3.Actions are often formed by interactions between multiple objects.There is a strong connection between the objects,and the contributions of the regions on the image to the understanding of the action are not the same.This paper proposes a spatial attention network to capture the spatial relationship between objects on the space,and combines the residual structure to design a variety of activation methods.Experiments show that the space attention network can effectively improve the accuracy of human motion recognition.4.In this paper,some training tricks in neural network training are moved to the field of motion recognition to explore the effects of activation functions,training strategies,network depth and other factors on human action recognition.
Keywords/Search Tags:Human Action Recognition, Deep Learning, Attention Mechanism, Temporal Relationship, Spatial Relationship
PDF Full Text Request
Related items