Font Size: a A A

Action Recognition Methods For Videos

Posted on:2022-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhaoFull Text:PDF
GTID:2518306740982619Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of economy and technology,video has become one of the important ways to disseminate information in contemporary society.Video action recognition is a key technology to realize the intelligent analysis of video data.It can be widely used in scenarios such as smart security,smart driving,and smart human-computer interaction.The key problems to be solved in the field of video action recognition are as follows:(1)How to extract temporal information from videos effectively;(2)How to make the video action recognition methods more efficient;(3)How to improve the robustness and generalization of the video action recognition methods.This thesis studies from the perspective of video frame sampling and neural network regularization,and proposes two novel video action recognition methods.The main contributions of this thesis are:1)A video action recognition method based on Dense Segmental Sampling is proposed.Limited by the computing power of current hardware devices,deep learning based video action recognition methods often sample a subset of frames from the video to train models and perform inference.Through experiments,we find that the existing dense sampling and segmental sampling are complementary to extract information from videos,so we propose Dense Segmental Sampling(DSS),which ensembles dense sampling and segmental sampling in a unified framework.It can simultaneously extract the long-term temporal and the local contextual information from videos.To efficiently utilize the clips sampled by DSS,this thesis further propose a novel neural network architecture called Temporal Dense Segment Network(TDSN).It takes the two kinds of clips sampled by DSS as input,uses two subnets to extract the longterm temporal information and local contextual information respectively,and the information are merged through the fusion modules.In the experiments,TDSN achieves excellent results compared with state-of-the-art methods,which encourages people to do more research on video frame sampling methods in the future.2)A video action recognition method based on Temporal Structure Dropout is proposed.The video features extracted by the 3D Convolutional Neural Networks contain over-fitting spatial information,which makes the temporal information cannot be effectively extracted.The existing regularization methods are not effective due to the lack of special consideration for the video data.To solve this problem,this thesis proposes a novel neural network regularization method called Temporal Structure Dropout(TSD).TSD can be added to existing neural networks in the form of a module.By reducing the dominant spatial information in the video features,TSD can make the neural network pay more attention to the learning of temporal structure,and extract more effective temporal information for action recognition.In the experiments,we show that adding TSD module in the neural network can alleviate the over-fitting problem and improve the performance of action recognition.Meanwhile,the training cost of TSD is limited.Therefore,TSD is an efficient regularization method for video action recognition.
Keywords/Search Tags:Video action recognition, Video frame sampling, Regularization methods, Deep learning
PDF Full Text Request
Related items