Font Size: a A A

Research On Human Action Recognition Based On Spatiotemporal Feature Aggregation

Posted on:2023-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:D Y YangFull Text:PDF
GTID:2558307070982589Subject:Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition is one of the research hotspots in the field of computer vision,which has attracted more and more researchers’ attention in recent years.However,the human action recognition network often has a very large amount of calculation and has high requirements on the computing power of the device,which greatly limits the application of action recognition in practice.In addition,in real life,human activities have very rich scenes,and human actions are also extremely flexible,diverse and similar.There are still many challenges in how to accurately recognize multi-scale human actions in complex scenes and diverse categories.In this paper,the related difficulties in human motion recognition are studied.The specific research contents are as follows:Aiming at the problems of high computational complexity and large amount of parameters in the existing human action recognition networks,a lightweight multi-scale depthwise convolution recognition network is proposed.The network structure uses the two-stream residual convolution network as the baseline network,and uses the spatiotemporal separation convolution strategy and the depthwise convolution strategy to reduce the amount of parameters and calculation of the network.In order to improve the accuracy of network recognition,this paper uses multi-scale convolution kernel in convolution layer to capture multi-scale spatiotemporal features,and explores the impact of channel information interaction and multi-scale feature aggregation on network performance.Through a large number of ablation and comparative experiments,this paper verifies that the proposed network is able to effectively reduce the network computing complexity and improve the network recognition accuracy.Aiming at the multi-scale problem in temporal and spatial domain of human action,based on the two-stream convolution network,three different levels of feature maps are used to design the feature pyramid structure to capture the multi-scale spatiotemporal features.In this paper,two feature pyramid network are designed from the perspectives of spatiotemporal feature aggregation flow direction and multi-scale spatiotemporal feature consistency: the former uses top-down and bottom-up feature aggregation flow direction;the latter uses the shared convolution strategy when aggregating features at different scales.And in order to explore the impact of different temporal and spatial feature and different aggregation methods on network accuracy,this paper designs a variety of output structures.Finally,the recognition effectiveness of the proposed feature pyramid network and feature aggregation method is verified by ablation experiments and comparative experiments.
Keywords/Search Tags:Convolutional network, Human action recognition, Lightweight, Multi-scale, Feature pyramid
PDF Full Text Request
Related items