Video Action Recognition Based On Multi-Stream Network Architecture

Posted on:2024-07-01

Degree:Master

Type:Thesis

Country:China

Candidate:J Wang

Full Text:PDF

GTID:2568307139996399

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Video action recognition is a crucial research direction in the computer vision domain,with widespread applications in various fields such as video surveillance,intelligent medical care,intelligent transportation,and human-computer interaction.The crux of the video recognition task is to extract motion features from actions depicted in the video,while there are often background clutter and irrelevant background objects in the video,which may affect the recognition of the ideal behavior.In addition,when there are complex situations such as target occlusion,lighting changes and camera motion in the video,it affects the target feature information and thus impairs the recognition accuracy.To address these problems,this paper proposes two novel a video action recognition approaches based on deep learning,which are presented as follows.In this paper,a spatio-temporal target saliency-based multi-stream multiplier ResNets(STOMM-ResNets)is proposed for action recognition.the STOMM-ResNets model consists of three interactive streams: the appearance stream,motion stream and spatio-temporal target saliency stream.Similar to the traditional two-stream CNN model,the appearance stream and the motion stream are responsible for capturing appearance information and motion information,respectively,while the spatio-temporal target saliency stream is responsible for capturing spatio-temporal target saliency information.In addition,in order to effectively utilize the spatiotemporal interaction information between different streams,the model establishes an interactive connection between the three different streams,replacing the information fusion that is usually done in the final output layer.Two different multiplicative connections are injected,the first one is from the motion stream to the appearance stream,and the second one is from the spatiotemporal target saliency stream to the appearance stream.the STOMM-ResNets model is experimented on two standard video action recognition datasets,UCF101 and HMDB51,and the experimental results validate the effectiveness of the model.In this paper,we propose a novel spatio-temporal target saliency-based multi-stream ResNets-LSTM(STOM-LSTM)that combines three streams(i.e.spatial,temporal and spatiotemporal saliency streams)for video action recognition,which can capture foreground information and suppress background information of spatio-temporal objects in videos.In addition,to capture the temporal long-term dependencies between consecutive video frames,we use an attention-aware LSTM approach for action recognition based on spatio-temporal target saliency-based multi-stream ResNets.The STOM-LSTM was experimented on the UCF-101 and HMDB-51 datasets and compared with STOMM-ResNets and other models to achieve similar accuracy and better performance than STOMM-ResNets on the same dataset.The results show that the STOM-LSTM method model proposed in this paper has good performance.

Keywords/Search Tags:

video action recognition, multiple streams, spatio-temporal target saliency, spatio-temporal interaction information, self-attentive mechanism long short-term memory network

PDF Full Text Request

Related items

1	Research On Video Action Recognition Based On Improved Long Short-term Memory Network
2	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
3	Research On Video Action Recognition Algorithm Based On Spatio-Temporal Features With 2D Convolutional Neural Networks Framework
4	Research On Spatio-Temporal Feature Based Human Action Recognition
5	Research On Spatio-Temporal Action Detection Based On Self-Attention
6	Research On Spatio-Temporal Indexing Mechanism And Querying Strategy
7	Research On Temporal Action Detection In Video
8	Research On Surveillance Video Synopsis Based On Spatio-Temporal Slice
9	Research On Human Skeleton Action Recognition Method Based On Graph Convolutional Network
10	Citywide Crowd Flow Prediction Based On Spatio-Temporal Trajectory Data