Font Size: a A A

Research On Video Human Action Recognition Based On Pose Sequence

Posted on:2021-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:C HuangFull Text:PDF
GTID:2428330647455383Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of image sensor technology and the further popularization of high-definition camera,as the "eye" of machine to dynamically perceive the 3D world,video intelligent understanding has attracted more and more attention.Compared with the single static image,video flow contains more moving information.Human action recognition based on video stream is a key technology in video intelligent understanding which is a process of encoding and classifying the feature vectors of human behavior in video sequence frames.Recently,deep learning's great progress subverts the traditional video human action recognition technology.It can adaptively learn the high-level abstract features in video sequences.Unlike other feature information,pose feature information is more clear and simple,and is not easily affected by appearance factors.Some researches show that the feature extraction based on deep neural network has a significant effect on human action recognition based on pose estimation.Therefore,in this paper,starting from the problem of human pose estimation,the deep neural network model is used as a means to identify the behavior and action based on the human pose sequence,and the experiments are made.The details are as follows:(1)A top-down multi-person pose estimation method based on full convolution network of channel attention and multi-scale feature fusion is proposed,which effectively improves the accuracy and speed of multi-person pose estimation in complex scenes.In the process of down sampling the feature map by human pose estimation network,the upper high-resolution information will be lost continuously.Aiming at this problem,a multi-scale feature fusion module is embedded in the structure of the classic U-shaped human pose estimation network,which makes the low-scale features in the network also contain high-resolution information.In order to further highlight the key channel information of the multi-scale fusion feature map,channel attention mechanism is introduced into the feature fusion module.The experimentation demonstrates the availability and superiority of the multi-person pose estimation method.(2)A dual-stream LSTM action recognition network based on time domain flow and space domain flow is proposed.The original LSTM network model of time-domain flow network branch is extended,and the global context attention unit is introduced to selectively focus on the skeleton frame with key dynamic information;the spatial attention module with skeleton key point selection mechanism is added to the backbone LSTM network of space domain flow network branch,so that the network can adaptively assign weights to different bones so as to selectively focus on the important key points used to distinguish different action behaviors.The experimentation demonstrates the availability and superiority of the dual stream LSTM network model.(3)An ALT-GCN network model is proposed,which combines the key points graph convolution and limb graph convolution.In order to make better use of skeleton limb information,a set of operation process different from skeleton key point graph convolution is constructed by defining the neighborhood relationship of limb edge.The ALT-GCN module is used to replace the basic ST-GCN module.With the deepening of the network level,the network can learn to get better topology connection,and optimize the graph convolution hierarchy structure,which can better identify different action types.The experimentation demonstrates the availability and superiority of ALT-GCN network model.To sum up,the critical technologies of video action recognition based on human pose sequence are studied,and effective solutions to the difficult problems from human pose estimation to human pose feature coding and classification are proposed,which provides an effective solution for the research and practical application of video human action recognition algorithm.
Keywords/Search Tags:Video action recognition, Graph convolution, Pose estimation, Attention mechanism, Dual-stream LSTM network
PDF Full Text Request
Related items