Font Size: a A A

Research On Human Action Recognition Technology Based On LSTM

Posted on:2022-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2518306335451914Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Human behavior recognition based on video is a hot research direction in the field of computer vision,which has a broad application prospect in intelligent security,humancomputer interaction,video retrieval and so on.There are many problems in the recognition of human behavior based on video,such as the difficulty of spatial-temporal feature interaction,the redundancy of video frames,and environmental noise.This thesis proposes a Spatial-Temporal enhanced Long Short-Term Memory algorithm(STA-LSTM)for video behavior recognition.This network mainly includes attention,3D convolution,LSTM network and other parts.The main tasks are as follows:(1)Aiming at the difficult problem of spatial-temporal feature interaction,a C3 D feature extraction network is proposed and pre-training is carried out with the Sport-1M dataset.The network was able to efficiently extract the spatial-temporal features in the video.Compared with 2D CNN,the most important feature of 3D CNN is that it adds information of time dimension and can stack multiple video frames as input.Such convolution operation of multiple consecutive video frames can realize simultaneous extraction of spatiotemporal features.(2)To solve the problem of redundancy in video frames,a spatiotemporal attention model(STA)is proposed,which can capture the correlation of video in time and space and focus attention on the key video frames and key areas of video frames.In terms of time,a one-dimensional vector is formed by Global Maximum Pooling(GMP),and the attention weight vector in time is calculated by operating on this one-dimensional vector.In space,the Global Maximum Pooling(GMP)and Global Average Pooling(GAP)are carried out for the channel,and the attention weight matrix in space is obtained by the operation of the pooled matrix.(3)In order to solve the problem of insufficient temporal information in 3D convolution processing,LSTM network is used to extract secondary features from 3D CNN feature sequences.Because several consecutive frames of 3D convolutional network input contain less time information,the extracted time features are not enough to achieve accurate behavior classification.Therefore,it is necessary to integrate the Long Short-Term Memory network(LSTM)to extract the feature sequences extracted by 3D convolution,and capture the long-term time features for behavior classification.(4)Aiming at the problem of environmental noise,the Gaussian Mixed Model is proposed to make background modeling,which can effectively remove background noise and obtain pedestrian information without interference.The STA-3D-LSTM is used to train and test on UCF-101 and HMDB51.The results show that the proposed algorithm improves the discrimination ability of the network and reflects the superiority of the proposed algorithm.
Keywords/Search Tags:Behavior recognition, 3D convolution, LSTM, Attention mechanism
PDF Full Text Request
Related items