Font Size: a A A

Research On Algorithm Of Human Action Recognition Based On Video

Posted on:2021-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiFull Text:PDF
GTID:2428330626456039Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Video-based human action recognition is a frontier subject in the field of computer vision and a very challenging subject.It has broad prospects in applications such as Intelligent monitoring system,Content-based video retrieval and human-computer interaction.In this paper,we study the subject from two directions: traditional machine learning algorithms and deep learning algorithms.The main research contents of this paper are as follows:1.Fast iDT algorithm.IDT algorithm is the classical machine learning algorithm with the best performance of human action recognition at present,so this paper adopts iDT algorithm framework as the basic algorithm framework in the research of human action recognition algorithm based on machine learning.However,iDT algorithm needs to extract dense optical flow,and the calculation of dense optical flow is very timeconsuming.Therefore,in order to improve the processing speed of the algorithm,this paper chooses a motion descriptor whose computation time is much less than that of dense optical Flow and its performance is close to that of dense optical Flow,namely the motion descriptor based on video compression(MPEG Flow),to replace the dense optical Flow in iDT algorithm.2.Deep aggregation network based on temporal segmentation.At present,deep convolutional network still faces two problems in the human action recognition task: 1)the network has poor recognition effect for the action with long temporal structure;2)the network cannot effectively capture the correlation between sub-characteristics of human action.In order to solve the above problems,an end-to-end two-stream network named deep aggregation network based on temporal segmentation is proposed.The network is mainly composed of two sub-networks,the deep network based on temporal segmentation and the deep aggregation network.The deep network based on temporal segmentation is mainly used to solve problem one.Because it adopts a video frame sampling strategy based on time temporal segmentation,it can obtain several sparse video sub-sequences covering the whole video segment,and each subsequence performs feature extraction through a two-stream network.The function of deep aggregation network is to solve problem two.The implementation of deep aggregation network mainly introduces NetVLAD as the feature aggregation layer.Compared with other aggregation methods such as maximum pooling and average pooling,NetVLAD focuses more on the connection between local features.3.The improvement of deep aggregation network based on temporal segmentation.This paper tries to improve the algorithm in two aspects,namely introducing the attention mechanism and the loss function.The introduction of spatio-temporal self-attention mechanism is mainly inspired by Non-local Neural Networks,which are introduced to improve the ability of NetVLAD to capture important spatio-temporal characteristics of video.The reason why the joint loss function containing the Center Loss is introduced is that the joint loss function plays an active role in another classification task,face recognition,and features with stronger differentiation can be learned through the joint loss function.
Keywords/Search Tags:Action Recognition, Fast IDT Algorithm, MPEG Flow, Temporal Segmentation, Deep Aggregation Network, Spatio-Temporal Self-attention Mechanism, Center Loss
PDF Full Text Request
Related items