Research On Action Recognition Algorithm Based On 3D Convolution

Posted on:2021-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:Q Hu

Full Text:PDF

GTID:2518306497957639

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

The field of action recognition is a research hotspot of computer vision.Traditional action recognition algorithms are designed to be complex and not robust.The action recognition algorithms based on deep learning have attracted much attention from researchers.Although the current action recognition algorithms based on deep learning have achieved good recognition results,there are still problems that are difficult to solve in terms of complex backgrounds,large 3D convolution model parameters,and insufficient long-term feature acquisition.In view of these problems,a deep research on action recognition algorithms has been conducted in this paper.The main research work is as follows:(1)A separate spatio-temporal convolution residual network based on frame appearance and inter-frame relations was proposed.The feature extraction network based on 3D convolution can directly perform end-to-end training and testing,but the amount of model parameters is large.In order to reduce the parameters of the model,2D convolution is introduced into the residual block of the3 D convolution residual network.The mixed residual blocks of 2D convolution and3 D convolution in series,parallel and series-parallel was designed.Model comparison experiments were performed on UCF101 dataset and Mini-Kinetics-200 dataset.Experimental results show that the residual network with parallel mixed residual blocks has the best recognition effect and can obtain more spatio-temporal information.In the parallel mixed residual block,the appearance information of the video frames is obtained in the 2D convolution branch,and square pooling is introduced in the 3D convolution branch to obtain the relationship between the video frames.In order to further improve the recognition accuracy,a separate spatio-temporal convolution is introduced instead of a three-dimensional convolution,thereby increasing non-linear factors.An Appearance and inter-frame Relations Separable spatio-temporal convolution Residual Network(ARSRNet)was proposed and the effectiveness of the model improvement is verified on two public datasets.Experimental results show that the ARSRNet model has a recognition accuracy of90.8% after pre-training on the UCF101 dataset.(2)An action recognition algorithm based on temporal multi-scale and attention mechanism was proposed.Aiming at the problem of insufficient long-term feature extraction of video,a temporal multi-scale mechanism is introduced into the residual block.Convolution kernels with different timing lengths are used to obtain the short-term,mid-term and long-term information of videos and then fused;at the same time,the channel attention mechanism is introduced to the output of the network residual block,learning the importance of each feature channel during the network training process.Useful features are strengthened,and useless features are suppressed.Based on this,a Appearance and inter-frame Relations Separable spatio-temporal convolution Residual Network based on temporal Multi-scale feature and Attention mechanism(ARSRNet-MA)was proposed.The experimental results show that the temporal multi-scale module and the attention module can effectively improve the overall recognition accuracy on the two public datasets,verifying the effectiveness of the method improvement.Experimental results show that the proposed ARSRNet-MA model has a recognition accuracy of 91.7% after pre-training on the UCF101 dataset.(3)A short video sharing classification system based on ARSRNet-MA was designed and implemented.The system mainly provides short video sharing,classification,collection and personal center functions.After the user uploads the video,the ARSRNet-MA algorithm proposed in this paper is used to implement the system's video classification function.Thse implementation of the system verify the feasibility of the algorithm in this paper.

Keywords/Search Tags:

Action recognition, Separation spatio-temporal convolution, Residual network, 3D convolution, Attention mechanism

PDF Full Text Request

Related items

1	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
2	Action Recognition Based On Convolution Recurrent Neural Network With Attention Mechanism
3	Research For Action Recognition Based On Spatial-Temporal Stream Convolution Neural Networks
4	Human Skeleton Action Recognition Based On Spatiotemporal Graph Attention Convolution Network
5	Attention Mechanism Based Action Recognition
6	Research On Human Action Recognition Based On Spatio-temporal Graph Convolutional Neural Network
7	Research On Graph Convolution Neural Network Based On Multi-attention Mechanism For Human Action Recognition
8	Action Recognition Based On Human Skeleton Graph Convolution And Image Convolution Fusion
9	Human Action Recognition Based On Spatio-temporal Feature
10	Research On Algorithm Of Human Action Recognition Based On Video