Font Size: a A A

Research On Action Recognition Algorithm Based On 3D Convolution

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q HuFull Text:PDF
GTID:2518306497957639Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The field of action recognition is a research hotspot of computer vision.Traditional action recognition algorithms are designed to be complex and not robust.The action recognition algorithms based on deep learning have attracted much attention from researchers.Although the current action recognition algorithms based on deep learning have achieved good recognition results,there are still problems that are difficult to solve in terms of complex backgrounds,large 3D convolution model parameters,and insufficient long-term feature acquisition.In view of these problems,a deep research on action recognition algorithms has been conducted in this paper.The main research work is as follows:(1)A separate spatio-temporal convolution residual network based on frame appearance and inter-frame relations was proposed.The feature extraction network based on 3D convolution can directly perform end-to-end training and testing,but the amount of model parameters is large.In order to reduce the parameters of the model,2D convolution is introduced into the residual block of the3 D convolution residual network.The mixed residual blocks of 2D convolution and3 D convolution in series,parallel and series-parallel was designed.Model comparison experiments were performed on UCF101 dataset and Mini-Kinetics-200 dataset.Experimental results show that the residual network with parallel mixed residual blocks has the best recognition effect and can obtain more spatio-temporal information.In the parallel mixed residual block,the appearance information of the video frames is obtained in the 2D convolution branch,and square pooling is introduced in the 3D convolution branch to obtain the relationship between the video frames.In order to further improve the recognition accuracy,a separate spatio-temporal convolution is introduced instead of a three-dimensional convolution,thereby increasing non-linear factors.An Appearance and inter-frame Relations Separable spatio-temporal convolution Residual Network(ARSRNet)was proposed and the effectiveness of the model improvement is verified on two public datasets.Experimental results show that the ARSRNet model has a recognition accuracy of90.8% after pre-training on the UCF101 dataset.(2)An action recognition algorithm based on temporal multi-scale and attention mechanism was proposed.Aiming at the problem of insufficient long-term feature extraction of video,a temporal multi-scale mechanism is introduced into the residual block.Convolution kernels with different timing lengths are used to obtain the short-term,mid-term and long-term information of videos and then fused;at the same time,the channel attention mechanism is introduced to the output of the network residual block,learning the importance of each feature channel during the network training process.Useful features are strengthened,and useless features are suppressed.Based on this,a Appearance and inter-frame Relations Separable spatio-temporal convolution Residual Network based on temporal Multi-scale feature and Attention mechanism(ARSRNet-MA)was proposed.The experimental results show that the temporal multi-scale module and the attention module can effectively improve the overall recognition accuracy on the two public datasets,verifying the effectiveness of the method improvement.Experimental results show that the proposed ARSRNet-MA model has a recognition accuracy of 91.7% after pre-training on the UCF101 dataset.(3)A short video sharing classification system based on ARSRNet-MA was designed and implemented.The system mainly provides short video sharing,classification,collection and personal center functions.After the user uploads the video,the ARSRNet-MA algorithm proposed in this paper is used to implement the system's video classification function.Thse implementation of the system verify the feasibility of the algorithm in this paper.
Keywords/Search Tags:Action recognition, Separation spatio-temporal convolution, Residual network, 3D convolution, Attention mechanism
PDF Full Text Request
Related items