Font Size: a A A

Action Recognition In Video Based On Deep Neural Network

Posted on:2022-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:J Y FengFull Text:PDF
GTID:2518306536972559Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the popularity of surveillance,the security demand based on video data has been expanded step by step.In order to make full use of these infrastructure as sensing organs,video-based action recognition has become a hot topic in academic research.At the same time,with the development of unmanned driving,the analysis of pedestrian behavior on the road is also important.The action recognition in this scene ensures the risk prevention coefficient of the vehicle.With the aging era around the world,the market of home service robots has gradually expanded.In order to serve human better,the robot applies action analysis on human beings is required to be higher.Therefore,video-based action recognition has a lot of needs in many real-world scenarios.Therefore,it is necessary to explore high reliability and high-performance action recognition algorithm.Through the research of action recognition algorithm,the machine is more automatic and intelligent,which brings practical value to the industry of placement,unmanned driving,service robot and so on.Therefore,it is of great significance to study the action recognition algorithm.At the same time,with the great achievements in various fields,and thanks to the strong network structure of feature learning and feature expression ability,deep learning has achieved ideal results in action recognition.Therefore,this paper proposes a video-based action recognition algorithm based on deep neural network.The main work of this paper includes:(1)The significance of video-based action recognition is expounded.The research status of video-based action recognition algorithm is fully investigated.The traditional feature-based action recognition algorithm and deeplearning-based action recognition algorithm are summarized and analyzed.(2)The action recognition algorithm based on mixed spatial attention and hierarchical temporal aggregation(MSAHTA)is proposed.In view of the lack of single and inefficient local spatial feature capture and temporal modeling,this paper proposes a action recognition algorithm based on mixed spatial attention and hierarchical temporal aggregation.The specific performance is that the local spatial attention module(SA)is proposed to enhance the local discrimination based on the high-level semantic features extracted by 2D CNN.At the same time,aiming at the single problem of temporal modeling,a hierarchical temporal aggregation module(HTA)is proposed to aggregate the temporal information on multiple scales and combine them into the feature of rich scale temporal,which enhances the ability of 2D CNN.The SA and HTA together constitute the 2D CNN action recognition model which can be embedded flexibly by MSAHTA module,which can enhance its spatial expression ability and the ability of modeling temporal relationship.We verify the validity of our method on UCF101,HMDB51,and carry out a lot of component analysis experiments.(3)The action recognition algorithm based on temporal saliency integration efficient spatiotemporal modeling is an important yet challenging problem for video action recognition.Existing state-of-the-art methods exploit motion clues to assist in short-term temporal modeling through temporal difference over consecutive frames.However,background noises will be inevitably introduced due to the camera movement.Besides,movements of different actions can vary greatly.In this paper,we propose a Temporal Saliency Integration(TSI)block,which mainly contains a Salient Motion Excitation(SME)module and a Cross-scale Temporal Integration(CTI)module.Specifically,SME aims to highlight the motion-sensitive area through local-global motion modeling,where the background suppression and pyramidal feature difference are conducted successively between neighboring frames to capture motion dynamics with less background noises.CTI is designed to perform multi-scale temporal modeling through a group of separate 1D convolutions respectively.Meanwhile,temporal interactions across different scales are integrated with attention mechanism.Through these two modules,long short-term temporal relationships can be encoded efficiently by introducing limited additional parameters.Extensive experiments are conducted on several popular benchmarks(i.e.,Something-Something v1 & v2,Kinetics-400,UCF-101,and HMDB-51),which demonstrate the effectiveness and superiority of our proposed method.(4)Aiming at the action recognition algorithm proposed in this paper,a action recognition algorithm framework based on deep learning is designed and implemented,which is used for researchers to quickly complete the algorithm implementation,and also convenient for developers to implement the project.
Keywords/Search Tags:Deep learning, temporal modeling, action analysis, video understanding
PDF Full Text Request
Related items