Spatiotemporal Squeeze-and-Excitation Residual Multiplier Networks For Video Action Recognition

Posted on:2020-09-13

Degree:Master

Type:Thesis

Country:China

Candidate:K Tong

Full Text:PDF

GTID:2428330575494239

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

As one of the main carriers of information,video has been more and more shared by humans.How to understand and analyze these massive amounts of video data is crucial.Research on human action recognition in videos has become a challenging topic in the field of computer vision.It is widely not only used in video information retrieval,daily life security,public video surveillance,but also human-computer interaction,scientific cognition and other fields.First,the research background,research significance and difficulties of action recognition are briefly introduced,and then the deep learning model based action recognition methods are comprehensively reviewed from three different aspects: the types and numbers of input signals,the combination with traditional feature extraction methods,and the pre-trained datasets.Furthermore,the performances of some typical methods on UCF101 and HMDB51 datasets are overviewed and analyzed.Last the possible future research directions are discussed from three perspectives: the video data preprocessing,the video human motion feature representation,and the model training.The current video action recognition method based on the depth model is summarized and analyzed for reference by relevant researchers.The two-stream deep model combining temporal information and spatial information is the most typical method in the field of video action recognition.Based on the two-stream network structure,a spatiotemporal squeeze-and-excitation residual multiplier networks for action recognition was proposed,which obtained effectively improved performance.The squeeze-and-excitation residual network is better than shallow networks or traditional deep networks of action recognition in learning spatial and temporal features.The long-term temporal dependence is captured by injecting the identity mapping kernel into the network model as a temporal filter.In the feature level fusion phase of two-stream networks,spatiotemporal feature multiplication fusion is used to further enhance the interaction between spatial information and temporal information of squeeze-and-excitation residual networks.In addition,a lot of ablation experiments were conducted to study the influence of spatial-temporal stream multiplication fusion methods,times and locations on the performance of the proposed model.Also,three different strategies are proposed to generate model ensembles,and the average and weighted average of the results of a model ensemble was calculated for the final recognition result.The experimental results on the UCF101 and HMDB51 datasets have shown that the proposed method has good performance in video action recognition.

Keywords/Search Tags:

Action recognition, Spatiotemporal stream, Squeeze-and-Excitation residual networks, Multiplication fusion, Multi-model ensemble

PDF Full Text Request

Related items

1	Research On Video Action Recognition Methods Based On Two-stream Networks
2	Research Of Video Action Recognition Based On Two-stream Information Fusion Network
3	Temporal Action Localization And Action Recognition Based On Deep Learning
4	Research On Behavior Recognition Algorithm Based On Two-stream Convolutional Neural Network
5	Human Action Recognition Based On Spatiotemporal Two Stream Convolution Network
6	Action Recognition Based On Spatiotemporal Convolution Networks
7	Research On Action Recognition Method Based On Multi-feature Fusion
8	Research On Human Action Recognition Based On Video Stream
9	Video Action Recognition System Based On Surveillance Scene
10	Research On Action Recognition Algorithm Based On Spatiotemporal Modeling And Its Application