Font Size: a A A

Research On Human Action Recognition Method In Video Sequence Based On Convolutional Neural Network

Posted on:2024-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:S K LiFull Text:PDF
GTID:2568307100461104Subject:Mechanics (Professional Degree)
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of computer vision and artificial intelligence technologies and the improvement of hardware devices,action recognition in video has received more and more attention and gradually becomes a hot topic in the field of computer vision.The main task of action recognition is to extract effective temporal and spatial features from videos and classify videos based on these features.Action recognition technology has a wide range of application prospects in intelligent video surveillance,human-computer interaction,virtual reality,smart medical and other fields.In the early stage,traditional action recognition methods use hand-designed features to recognize actions,but such methods are difficult to recognize and have low recognition rates.In recent years,there have been significant advances in action recognition methods as a result of the rapid development of deep learning.Deep learning-based methods can effectively avoid most of the complex pre-processing processes and can automatically learn the effective features in the video classification,resulting in a relatively good recognition performance.However,deep learning-based methods still have some problems that need to be optimized,such as how to improve the recognition accuracy of actions that are highly dependent on time,and how to effectively extract temporal and spatial information in videos.Therefore,this paper summarizes and generalizes the existing action recognition methods,and proposes two new action recognition methods.The main work of this thesis has the following two aspects:(1)To accurately extract fine-grained motion features in the video and improve the recognition accuracy of actions that are highly dependent on temporal information,we propose an improved action recognition network based on local frame difference extraction and global temporal excitation(LDCT)to capture local and global spatiotemporal features.Specifically,we propose a Local Frame Difference Extraction Module(LDEM)and Channel Temporal Excitation Module(CTM).The LDEM uses two branches to calculate the spatial features and local motion features in each video clip respectively,then merges them to obtain the local spatiotemporal features in each video segment.The CTM adaptively enhances the characteristics of channels sensitive to temporal information by modeling the interdependences of channels in time.We conduct experiments on the mainstream action recognition dataset Something-Something V1 &V2 to validate the effectiveness of the LDCT network and compare the experimental results with the current action recognition methods,which verifies the effectiveness of the method.We also perform ablation experiments for the LDEM and CTM module to validate their effectiveness.(2)To enhance the connection between spatial and temporal features during feature extraction,and effectively extract short-range and long-range spatiotemporal information in the video.We propose an action recognition network based on motion enhancement and spatiotemporal feature aggregation(MTSN).Specifically,we propose a Motion Excitation Module(MEM)and a Spatiotemporal Feature Aggregation Module(TSAM).The MEM is mainly embedded into each feature extraction layer in the residual branch of the network,and carries out differential operation on the input features along the time dimension to obtain the motion sensitive weight,then excites the motion sensitive channel to enhance short-range motion features.Based on the idea of group convolution,TSAM splits the input features along the channel dimension,extracts spatiotemporal features separately,and improves the receptive field of the network by residual connection between the two groups.We conduct experiments on the Something-Something V1 dataset to validate the effectiveness of the MTSN network and compare the experimental results with the current action recognition methods.We also perform ablation experiments for the MEM and TSAM module.
Keywords/Search Tags:action recognition, motion features, attention mechanism, residual network
PDF Full Text Request
Related items