Research On Human Action Recognition Based On Spatiotemporal Feature Aggregation

Posted on:2023-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Yang

Full Text:PDF

GTID:2558307070982589

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Human action recognition is one of the research hotspots in the field of computer vision,which has attracted more and more researchers’ attention in recent years.However,the human action recognition network often has a very large amount of calculation and has high requirements on the computing power of the device,which greatly limits the application of action recognition in practice.In addition,in real life,human activities have very rich scenes,and human actions are also extremely flexible,diverse and similar.There are still many challenges in how to accurately recognize multi-scale human actions in complex scenes and diverse categories.In this paper,the related difficulties in human motion recognition are studied.The specific research contents are as follows:Aiming at the problems of high computational complexity and large amount of parameters in the existing human action recognition networks,a lightweight multi-scale depthwise convolution recognition network is proposed.The network structure uses the two-stream residual convolution network as the baseline network,and uses the spatiotemporal separation convolution strategy and the depthwise convolution strategy to reduce the amount of parameters and calculation of the network.In order to improve the accuracy of network recognition,this paper uses multi-scale convolution kernel in convolution layer to capture multi-scale spatiotemporal features,and explores the impact of channel information interaction and multi-scale feature aggregation on network performance.Through a large number of ablation and comparative experiments,this paper verifies that the proposed network is able to effectively reduce the network computing complexity and improve the network recognition accuracy.Aiming at the multi-scale problem in temporal and spatial domain of human action,based on the two-stream convolution network,three different levels of feature maps are used to design the feature pyramid structure to capture the multi-scale spatiotemporal features.In this paper,two feature pyramid network are designed from the perspectives of spatiotemporal feature aggregation flow direction and multi-scale spatiotemporal feature consistency: the former uses top-down and bottom-up feature aggregation flow direction;the latter uses the shared convolution strategy when aggregating features at different scales.And in order to explore the impact of different temporal and spatial feature and different aggregation methods on network accuracy,this paper designs a variety of output structures.Finally,the recognition effectiveness of the proposed feature pyramid network and feature aggregation method is verified by ablation experiments and comparative experiments.

Keywords/Search Tags:

Convolutional network, Human action recognition, Lightweight, Multi-scale, Feature pyramid

PDF Full Text Request

Related items

1	Human Action Recognition Based On Convolutional Neural Networks
2	Research On Human Action Recognition Method Based On 3D Convolutional Neural Network
3	Human Action Recognition In Videos Of Realistic Scenes Based On Multi-Scale CNN Feature
4	Research On Video Action Recognition Algorithm Based On Multi-scale Spatiotemporal Feature Extraction
5	Human Action Recognition Method Based On DenseNet And Multi-Scale Temporal Information
6	Research On Human Skeleton Action Recognition Based On Graph Convolutional Networks
7	Research On Human Action Recognition Algorithm Based On Two Stream Convolutional Neural Network
8	Human Action Recognition Based On Multi-Feature Fusion
9	Research On Human Detection And Action Recognition Based On Convolution Feature Deformable Part Model
10	Human Action Recognition Based On Multi-mode Feature Fusion