Video Human Action Recognition Based On Spatial-temporal CNN Framework

Posted on:2020-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:H Yu

Full Text:PDF

GTID:2428330596471426

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,the application prospects in the fields of videos surveillance,behavior analysis and video retrieval are more clear.Due to the diversity of videos,such as the complexity of the background environment,the transformation of the spatial perspective in the videos,and the complexity of human action,such as intra-class differences and inter-class differences between actions,the research on human action still faces Many challenges.Aiming at the above problems,this dissertation proposes a fusion of improved temporal and spatial convolutional neural network human action recognition algorithm(spatial-temporal STCNN)to obtain the spatial characteristics and temporal features of human action in videos.Both temporal and spatial networks use an improved CNN framework at the same time.The main research work are as follows:Firstly,the human motion optical flow information extracted by the improved dense trajectory(iDT)algorithm is proposed as the input of the CNN temporal network.When the feature extraction of human action is carried out through the temporal network and the spatial network,the input of the spatial network is the RGB data of the image,the temporal network is combined with the iDT algorithm,and the input is the data of the continuous optical flow field,and the traditional optical flow is adopted.The algorithm extracts the trajectory of human action in multiple video frames and then performs Histogram Of Gradient(HOG)features,Histogram Of Flow(HOF)features,and Motion Boundary Histograms around the trajectory(MBH)feature extraction,extracting significant areas of human action,and then tracking the trajectory of human action.Secondly,a spatial affine transformation(STN)algorithm based on the traditional CNN framework is proposed.The traditional CNN frameworklacks spatial invariance when processing input data,because the pooling mechanism of CNN has no spatial transformation ability such as translation and rotation for identifying the whole target,and only a small transformation is reflected when processing deep feature maps.Therefore,the STN is merged with the CNN framework,and the input video frames is spatially transformed by the STN network,for example,a series of operations such as rotation and scaling.;after the first convolutional layer,five STN structures are placed,and the five regions of interest are extracted from different parts of the human body.The transformed feature map is then subjected to high-level feature extraction through the convolutional layer and the pooling layer.This experiment is carried out on the standard human action datasets HMDB51 and UCF101.The experimental results show that the recognition rate of the algorithm on the HMDB51 dataset is 90.5%,and the recognition rate on the UCF101 dataset is 62.2%.The experimental results show that the proposed algorithm can achieve better recognition results than other human action recognition algorithms.

Keywords/Search Tags:

Human action recognition, convolutional neural network, spatial affine transformation, dense trajectory algorithm

PDF Full Text Request

Related items

1	Research On Video Human Action Recognition Algorithms Based On Deep Learning
2	Human Action Recognition Based On Convolutional Neural Networks
3	Research On Improved Algorithm And System Of Handwritten Digit Recognition Based On Neural Network
4	Research On Human Action Recognition Based On Convolutional Neural Network
5	Study On Human Action Recognition In Videos Based On Convolutional Neural Network
6	Human Action Recognition Based On Dense Trajectory And Regularized Multi-task Learning
7	Research On Human Action Recognition Based On Convolutional Neural Network
8	Research On Human Action Recognition In Video
9	Research On Human Action Recognition Based On Graph Convolutional Neural Networks
10	Human Detection Of Background Model And Action Recognition Of Dense Trajectory In Surveillance Video